nicer way to merge list of dictionaries by key - python

I have a list of dictionaries and a function that can extract a value from each of those dictionaries in the list. The goal is that i get a dictionary where the keys are the values that are returned by the given function when i pass it the dictionaries from the given list of dictionaries. The according values in the returned dictionary should be the subset of dictionaries from the original list of dictionaries for which the given function returned the according key.
I know this explanation is very confusing, so I'm showing it in an implementation:
keygen = lambda x: x['key']
data = [{'key': 'key1',
'data': 'value2'},
{'key': 'key3',
'data': 'value2'},
{'key': 'key2',
'data': 'value2'},
{'key': 'key2',
'data': 'value2'},
{'key': 'key1',
'data': 'value2'}]
def merge_by_keygen(data, keygen):
return_value = {}
for dataset in data:
if keygen(dataset) not in return_value.keys():
return_value[keygen(dataset)] = []
return_value[keygen(dataset)].append(dataset)
return return_value
merge_by_keygen(data, keygen)
returns:
{'key3': [{'data': 'value2', 'key': 'key3'}],
'key2': [{'data': 'value2', 'key': 'key2'}, {'data': 'value2', 'key': 'key2'}],
'key1': [{'data': 'value2', 'key': 'key1'}, {'data': 'value2', 'key': 'key1'}]}
I'm looking for a nicer and more compact implementation of the same logic, like some dictionary/list comprehensions. Thanks!

This is an ideal problem to be handled by itertools.groupby
Implementation
from itertools import groupby
from operator import itemgetter
groups = groupby(sorted(data, key = itemgetter('key')), key = itemgetter('key'))
data_dict = {k : list(g) for k, g in groups}
or if you prefer one-liner
data_dict = {k : list(g)
for k, g in groupby(sorted(data,
key = itemgetter('key')),
key = itemgetter('key'))}
Output
{'key1': [{'data': 'value2', 'key': 'key1'},
{'data': 'value2', 'key': 'key1'}],
'key2': [{'data': 'value2', 'key': 'key2'},
{'data': 'value2', 'key': 'key2'}],
'key3': [{'data': 'value2', 'key': 'key3'}]}

If you don't mind using a third-party package, this is easily done with toolz.groupby:
>>> import toolz
>>> toolz.groupby(keygen, data)
{'key1': [{'data': 'value2', 'key': 'key1'},
{'data': 'value2', 'key': 'key1'}],
'key2': [{'data': 'value2', 'key': 'key2'},
{'data': 'value2', 'key': 'key2'}],
'key3': [{'data': 'value2', 'key': 'key3'}]}
The same result is also obtained with toolz.groupby('key', data)

I don't think this is amenable to a comprehension, but you can make it tidier using a collections.defaultdict(list) instance:
import collections
def merge_by_keygen(data, keygen):
return_value = collections.defaultdict(list)
for dataset in data:
key = keygen(dataset)
return_value[key].append(dataset)
return return_value
That looks pretty clean to me - you could mess around with ways to move where you call the keygen function if you like but I think you'd probably lose clarity.

I think this does it
return_value = {}
for d in data:
return_value.setdefault(keygen(d), []).append(d)
You can write it in a list comprehension, but it's ugly to use the side effects of a list comprehension to affect data and then build up a list of None results and throw it away...
r = {}
[r.setdefault(keygen(d), []).append(d) for d in data]
The core of your function all mashes down into the dictionary setdefault method. All three lines about calling the keygen, checking if the key is in the return dictionary, if it's not create an empty list, store the empty list in the dictionary, then get query the dictionary again to get the list ready to append to it - all done by setdefault().

Related

How to update a list of dicts in pymongo?

I have some records like that in collection:
{
'_id': 1,
'test_field': [{'key1': 'value1'}, {'key2': 'value2'}]
}
test_field is a list of dicts. I need to push new dict in that list if any key does not exist and if it does I need to update that key’s value.
Examples:
{'key1': 'test_value'} → 'test_field': [{'key1': 'test_value'}, {'key2': 'value2'}]
{'test_key': 'test_value2'} → 'test_field': [{'key1': 'value1'}, {'key2': 'value2'}, {'test_key': 'test_value_2'}]
Help please
if you need a function in python to do it, this might work for you.
def modify_test_field(my_dict, test_field, new_key, new_val):
my_dict[test_field] = [obj for obj in my_dict[test_field] if new_key not in obj]
my_dict[test_field].append({new_key: new_val})
and call it like modify_test_field(orig_dict, 'test_field', new_key, new_val)

Removing nulls and empty objects of mixed data types from a dictionary

How would one go about cleaning a dictionary containing a variety of datatypes of nulls and empty lists, dicts etc. E.g.
raw = {'key': 'value', 'key1': [], 'key2': {}, 'key3': True, 'key4': None}
To:
refined = {'key': 'value', 'key3': true}
Because of the mixed nature of data types in the dictionary, using:
refined = {k:v for k,v in processed.items() if len(v)>0}
throws a
TypeError: object of type 'bool' has no len()
Is there a solution to make a second conditional based on type(v) is bool?
Edit: I've found the issue I was encountering employing solutions was a result of the structure of the data, asking a separate question to deal with that.
You can try this.
refined={k:v for k,v in raw.items() if v or isinstance(v,bool)}
raw={'key': 'value',
'key1': [],
'key2': {},
'key3': True,
'key4': None,
'key5': False}
refined={k:v for k,v in raw.items() if v or isinstance(v,bool)}
#{'key': 'value', 'key3': True, 'key5': False}
How about
refined = {k:v for k, v in processed.items() v is not None and (type(v) not in (list, dict) or len(v) > 0)}

String to dict parsing without commas

There is a weird string representation like
"key1:value1:key2:value2:key3:value3...keyn:valuen"
I need to create a dict and it's pretty easy to solve when you have commas, however there is only colons here and you have to split the string every second colon. Code with cycle or soemthing like that looks pretty ugly, so I wonder if you could help me with oneliner.
You can just split on colons, get an iterator over the tokens and zip the iterator with itself. That will pair keys and values nicely:
s = 'key1:value1:key2:value2:key3:value3:keyn:valuen'
it = iter(s.split(':'))
dict(zip(it, it))
# {'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyn': 'valuen'}
If you are uncomfortable with iter (and schwobaseggls solutions wich I deem superior), you can use zipped list-slices in almost the same way:
s = 'key1:value1:key2:value2:key3:value3:keyn:valuen'
splitted = s.split(':')
# even_elements = splitted[::2] - take every 2nd starting at 0th index
# odd_elements = splitted[1::2] - take every 2nd startIng at 1st index
k = {k:v for k,v in zip(splitted[::2],splitted [1::2]) }
print(k)
Output:
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyn': 'valuen'}
Alternativly, you create the dict by hand:
s = 'key1:value1:key2:value2:key3:value3:keyn:valuen'
splitted = s.split(':')
d ={}
# this generates 2-slices from the list and puts them into your dict
for k,v in (splitted[i:i+2] for i in range(0,len(splitted),2)):
d[k] = v
# or d = { k:v for k, v in (splitted[i:i+2] for i in range(0,len(splitted),2) )}
# or d = dict(splitted[i:i+2] for i in range(0,len(splitted),2) )
print(d)
Output:
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyn': 'valuen'}

list of dictionaries from single dictionary

I have a dictionary like this
dict1 = {'key1': 'value1', 'key2': 'value2'}
how do I have an array of the keys and values as dictionaries like this
array_of_dict_values = [{'key1': 'value1'}, {'key2': 'value2'}]
What would be the easiest way to accomplish this?
You can do this:
>>> aDict = {'key1': 'value1', 'key2': 'value2'}
>>> aList = [{k:v} for k, v in aDict.items()]
>>> aList
[{'key2': 'value2'}, {'key1': 'value1'}]
While somebody already answered with how to do this, I'm going to answer with "you probably don't want to do this." If every entry is a dictionary with a single key, wouldn't a list of key-value pairs work just as well?
dictionary = {'key1': 'value1', 'key2': 'value2'}
print(list(dictionary.items()))
# [('key2', 'value2'), ('key1', 'value1')]

single list to dictionary

I have this list:
single = ['key1', 'value1', 'key2', 'value2', 'key3', 'value3']
What's the best way to create a dictionary from this?
Thanks.
>>> single = ['key1', 'value1', 'key2', 'value2', 'key3', 'value3']
>>> dict(zip(single[::2], single[1::2]))
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
Similar to SilentGhost's solution, without building temporary lists:
>>> from itertools import izip
>>> single = ['key1', 'value1', 'key2', 'value2', 'key3', 'value3']
>>> si = iter(single)
>>> dict(izip(si, si))
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
This is the simplest I guess. You will see more wizardry in solution here using list comprehension etc
dictObj = {}
for x in range(0, len(single), 2):
dictObj[single[x]] = single[x+1]
Output:
>>> single = ['key1', 'value1', 'key2', 'value2', 'key3', 'value3']
>>> dictObj = {}
>>> for x in range(0, len(single), 2):
... dictObj[single[x]] = single[x+1]
...
>>>
>>> dictObj
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
>>>
L = ['key1', 'value1', 'key2', 'value2', 'key3', 'value3']
d = dict(L[n:n+2] for n in xrange(0, len(L), 2))
>>> single = ['key', 'value', 'key2', 'value2', 'key3', 'value3']
>>> dict(zip(*[iter(single)]*2))
{'key3': 'value3', 'key2': 'value2', 'key': 'value'}
Probably not the most readable version though ;)
You haven't specified any criteria for "best". If you want understandability, simplicity, easily modified to check for duplicates and odd number of inputs, works with any iterable (in case you can't find out the length in advance), NO EXTRA MEMORY USED, ... try this:
def load_dict(iterable):
d = {}
pair = False
for item in iterable:
if pair:
# insert duplicate check here
d[key] = item
else:
key = item
pair = not pair
if pair:
grumble_about_odd_length(key)
return d

Categories