I'm trying to create a defaultdict within a defaultdict based on a key value. My thinking may be completely wrong here, but here's the code for a basic defaultdict;
def record():
return {
'count': 0,
'key1': Counter(),
}
1_record = defaultdict(record)
But what if I want to add a key as a defaultdict like this;
def record():
return {
'count': 0,
'key1': Counter(),
'key2': {
'count': 0,
'nested_key1': Counter()
}
}
In the above how could I make 'key2' a defaultdict? Is this even possible or am I approaching the problem the wrong way?
You definitely can have a "recursive" defaultdict:
>>> from collections import defaultdict
>>> def record():
... return {
... 'key': defaultdict(record)
... }
...
>>> d = defaultdict(record)
>>>
>>> d['foo']
{'key': defaultdict(<function record at 0x10b396f50>, {})}
>>> d['foo']['key']['bar']
{'key': defaultdict(<function record at 0x10b396f50>, {})}
>>> d
defaultdict(<function record at 0x10b396f50>, {'foo': {'key': defaultdict(<function record at 0x10b396f50>, {'bar': {'key': defaultdict(<function record at 0x10b396f50>, {})}})}})
However, swapping out the names of the keys at different levels would probably require some special-casing and would make the code a bit more messy...
Related
I have some records like that in collection:
{
'_id': 1,
'test_field': [{'key1': 'value1'}, {'key2': 'value2'}]
}
test_field is a list of dicts. I need to push new dict in that list if any key does not exist and if it does I need to update that key’s value.
Examples:
{'key1': 'test_value'} → 'test_field': [{'key1': 'test_value'}, {'key2': 'value2'}]
{'test_key': 'test_value2'} → 'test_field': [{'key1': 'value1'}, {'key2': 'value2'}, {'test_key': 'test_value_2'}]
Help please
if you need a function in python to do it, this might work for you.
def modify_test_field(my_dict, test_field, new_key, new_val):
my_dict[test_field] = [obj for obj in my_dict[test_field] if new_key not in obj]
my_dict[test_field].append({new_key: new_val})
and call it like modify_test_field(orig_dict, 'test_field', new_key, new_val)
I have N dataframes, from each of which I'm extracting df['col'].value_counts() and converting these to a dictionary so I have:
my_dict = {'key1' : val1, 'key2' : val2, ... , 'keyM' : valM}
How do I update my_dict so that:
If a random new dataframe D has the same key as a previous dataframe (e.g. 'key1'), then it adds the value to val1. In other words, if 'key1' had a value of 21 and the new dataframe has a value of 18 for the same key ('key1') , the dictionary key value should now be 'key1' : 39.
If however, the key does not exist, then it should create a new key with the relevant value.
Does that make sense? I feel like I'm overcomplicating this...
collections.Counter is built for this.
from collections import Counter
c1 = Counter(my_dict)
c2 = Counter(my_other_dict)
c_sum = c1 + c2
On the other hand, you should be able to do this within pandas too; value_counts() returns a Series which you should be able to add to other Series objects directly and have it behave how you expect.
Iterate over the key/values of new keys and update my_dict. You should also look into using defaultdict from the collections module
my_dict = {'key1': 21, 'key2': 10}
my_dict2 = {'key1': 18, 'key3': 5}
for k, v in my_dict2.items():
if k in my_dict:
my_dict[k] += v
else:
my_dict[k] = v
Using defaultdict
from collections import defaultdict
my_dict = defaultdict(int, {'key1': 21, 'key2': 10})
my_dict2 = {'key1': 18, 'key3': 5}
for k, v in my_dict2.items():
my_dict[k] += v
Here's another answer that uses collections as well:
from collections import defaultdict as ddict
some_list_of_dicts = [
{'val1': 5, 'val2': 3},
{'val1': 2, 'val2': 1, 'val3': 9},
]
my_dict = ddict(int)
for i in some_list_of_dicts:
for key, count in i.items():
my_dict[key] += count
print(dict(my_dict))
A defaultdict of int will be initialised to 0 when an unknown key is introduced.
How can I turn a list of dictionaries into a single dictionary?
For example, let's say my initial list is as:
Dictionary_list = [{key:value}, {key2:value2}, {key3:value3}]
I need the resultant dictionary as:
New_dictionary = {key:value, key2:value2, key3:value3}
Another solution would be to create an empty dictionary and update it:
>>> my_list = [{'key':'value'}, {'key2':'value2'}, {'key3':'value3'}]
>>> my_dict = {}
>>> for d in my_list: my_dict.update(d)
...
>>> my_dict
{'key': 'value', 'key2': 'value2', 'key3': 'value3'}
In general, the update() method is mighty useful, typically when you want to create "environments" containing variables from successive dictionaries.
You may use dictionary comprehension to achieve this as:
>>> my_list = [{'key':'value'}, {'key2':'value2'}, {'key3':'value3'}]
>>> my_dict = {k: v for item in my_list for k, v in item.items()}
>>> my_dict
{'key3': 'value3', 'key2': 'value2', 'key': 'value'}
Note: If your initial list of dictionaries will have any "key" present in more than one dictionary, above solution will end up keeping the last "value" of "key" from the initial list.
Functional programming answer:
from functools import reduce # depending on version of python you might need this.
my_list = [{'key':'value'}, {'key2':'value2'}, {'key3':'value3'}]
def func(x,y):
x.update(y)
return x
new_dict = reduce(func, my_list)
>>> new_dict
{'key': 'value', 'key2': 'value2', 'key3': 'value3'}
One liner:
new_dict = reduce(lambda x, y: x.update(y) or x, my_list) # use side effect in lambda
I have a list of dictionaries and a function that can extract a value from each of those dictionaries in the list. The goal is that i get a dictionary where the keys are the values that are returned by the given function when i pass it the dictionaries from the given list of dictionaries. The according values in the returned dictionary should be the subset of dictionaries from the original list of dictionaries for which the given function returned the according key.
I know this explanation is very confusing, so I'm showing it in an implementation:
keygen = lambda x: x['key']
data = [{'key': 'key1',
'data': 'value2'},
{'key': 'key3',
'data': 'value2'},
{'key': 'key2',
'data': 'value2'},
{'key': 'key2',
'data': 'value2'},
{'key': 'key1',
'data': 'value2'}]
def merge_by_keygen(data, keygen):
return_value = {}
for dataset in data:
if keygen(dataset) not in return_value.keys():
return_value[keygen(dataset)] = []
return_value[keygen(dataset)].append(dataset)
return return_value
merge_by_keygen(data, keygen)
returns:
{'key3': [{'data': 'value2', 'key': 'key3'}],
'key2': [{'data': 'value2', 'key': 'key2'}, {'data': 'value2', 'key': 'key2'}],
'key1': [{'data': 'value2', 'key': 'key1'}, {'data': 'value2', 'key': 'key1'}]}
I'm looking for a nicer and more compact implementation of the same logic, like some dictionary/list comprehensions. Thanks!
This is an ideal problem to be handled by itertools.groupby
Implementation
from itertools import groupby
from operator import itemgetter
groups = groupby(sorted(data, key = itemgetter('key')), key = itemgetter('key'))
data_dict = {k : list(g) for k, g in groups}
or if you prefer one-liner
data_dict = {k : list(g)
for k, g in groupby(sorted(data,
key = itemgetter('key')),
key = itemgetter('key'))}
Output
{'key1': [{'data': 'value2', 'key': 'key1'},
{'data': 'value2', 'key': 'key1'}],
'key2': [{'data': 'value2', 'key': 'key2'},
{'data': 'value2', 'key': 'key2'}],
'key3': [{'data': 'value2', 'key': 'key3'}]}
If you don't mind using a third-party package, this is easily done with toolz.groupby:
>>> import toolz
>>> toolz.groupby(keygen, data)
{'key1': [{'data': 'value2', 'key': 'key1'},
{'data': 'value2', 'key': 'key1'}],
'key2': [{'data': 'value2', 'key': 'key2'},
{'data': 'value2', 'key': 'key2'}],
'key3': [{'data': 'value2', 'key': 'key3'}]}
The same result is also obtained with toolz.groupby('key', data)
I don't think this is amenable to a comprehension, but you can make it tidier using a collections.defaultdict(list) instance:
import collections
def merge_by_keygen(data, keygen):
return_value = collections.defaultdict(list)
for dataset in data:
key = keygen(dataset)
return_value[key].append(dataset)
return return_value
That looks pretty clean to me - you could mess around with ways to move where you call the keygen function if you like but I think you'd probably lose clarity.
I think this does it
return_value = {}
for d in data:
return_value.setdefault(keygen(d), []).append(d)
You can write it in a list comprehension, but it's ugly to use the side effects of a list comprehension to affect data and then build up a list of None results and throw it away...
r = {}
[r.setdefault(keygen(d), []).append(d) for d in data]
The core of your function all mashes down into the dictionary setdefault method. All three lines about calling the keygen, checking if the key is in the return dictionary, if it's not create an empty list, store the empty list in the dictionary, then get query the dictionary again to get the list ready to append to it - all done by setdefault().
I have the following dictionary:
my_dict = {'key1': {'key2': {'foo': 'bar'} } }
and I would like to append an entry to key1->key2->key3 with value 'blah' yielding:
my_dict = {'key1': {'key2': {'foo': 'bar', 'key3': 'blah'} } }
I am looking for a generic solution that is independent of the number of keys, i.e. key1->key2->key3->key4->key5 should work as well, even though keys from key3 on downwards do not exist. So that I get:
my_dict = {'key1': {'key2': {'foo': 'bar', 'key3': {'key4': {'key5': 'blah'} } } } }
Thanks in advance.
You can use the reduce() function to traverse a series of nested dictionaries:
def get_nested(d, path):
return reduce(dict.__getitem__, path, d)
Demo:
>>> def get_nested(d, path):
... return reduce(dict.__getitem__, path, d)
...
>>> my_dict = {'key1': {'key2': {'foo': 'bar', 'key3': {'key4': {'key5': 'blah'}}}}}
>>> get_nested(my_dict, ('key1', 'key2', 'key3', 'key4', 'key5'))
'blah'
This version throws an exception when a key doesn't exist:
>>> get_nested(my_dict, ('key1', 'nonesuch'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in get_nested
KeyError: 'nonesuch'
but you could replace dict.__getitem__ with lambda d, k: d.setdefault(k, {}) to have it create empty dictionaries instead:
def get_nested_default(d, path):
return reduce(lambda d, k: d.setdefault(k, {}), path, d)
Demo:
>>> def get_nested_default(d, path):
... return reduce(lambda d, k: d.setdefault(k, {}), path, d)
...
>>> get_nested_default(my_dict, ('key1', 'nonesuch'))
{}
>>> my_dict
{'key1': {'key2': {'key3': {'key4': {'key5': 'blah'}}, 'foo': 'bar'}, 'nonesuch': {}}}
To set a value at a given path, traverse over all keys but the last one, then use the final key in a regular dictionary assignment:
def set_nested(d, path, value):
get_nested_default(d, path[:-1])[path[-1]] = value
This uses the get_nested_default() function to add empty dictionaries as needed:
>>> def set_nested(d, path, value):
... get_nested_default(d, path[:-1])[path[-1]] = value
...
>>> my_dict = {'key1': {'key2': {'foo': 'bar'}}}
>>> set_nested(my_dict, ('key1', 'key2', 'key3', 'key4', 'key5'), 'blah')
>>> my_dict
{'key1': {'key2': {'key3': {'key4': {'key5': 'blah'}}, 'foo': 'bar'}}}
An alternative to Martijn Pieters's excellent answer would be to use a nested defaultdict, rather than a regular dictionary:
from collections import defaultdict
nested = lambda: defaultdict(nested) # nested dictionary factory
my_dict = nested()
You can set values by using regular nested dictionary access semantics, and empty dictionaries will be created to fill the middle levels as necessary:
my_dict["key1"]["key2"]["key3"] = "blah"
This of course requires that the number of keys be known in advance when you write the code to set the value. If you want to be able to handle a variable-length list of keys, rather than a fixed number, you'll need functions to do the getting and setting for you, like in Martijn's answer.