Related
I have a question regarding how to limit the value length of a dictionary:
real_vertrices_copy = {
'key': {1,3,8,4,2},
'key1': {9,4,2,4},
'key2': {6,4,2},
'key3': {9,3,5,3,5}
}
The dictionary consists of key value pairs, with the values varying in length.
I want to limit the length of the values to 3, so if the length of the values is > 3, I want to pop random elements as many times necessary to reduce it to len=3.
I tried it like this:
for x in real_vertrices_copy.values():
if len(x) > 20:
real_vertrices_copy.popitem()
but this just removes random dictionary entries (I did not expect this to work, I have to admit)
Does anyone have an idea how to solve this?
Thank you and BR
You could just pop three times from each set with more than three elements to get three arbitrary elements:
d = {
'key': {1,3,8,4,2},
'key1': {9,4,2,4},
'key2': {6,4,2},
'key3': {9,3,5,3,5}
}
d = {
k: {v.pop() for _ in range(3)} if len(v) > 3 else v
for k, v in d.items()
}
This assumes that there are no other references to the sets in d because the mutation via pop will be seen across all references.
You can use a dictionary comprehension with random sampling (random.sample):
d = {'key': {1,3,8,4,2},
'key1': {9,4,2,4},
'key2': {6,4,2},
'key3': {9,3,5,3,5}
}
import random
out = {k: set(random.sample(list(v), 3)) for k,v in d.items()}
or, if it is possible that the input has less than 3 elements:
N = 3
out = {k: set(random.sample(list(v), N)) if len(v)>=N else v
for k,v in d.items()}
example output:
{'key': {1, 3, 8}, 'key1': {2, 4, 9}, 'key2': {2, 4, 6}, 'key3': {3, 5, 9}}
classical loop
If you need a classical loop and want to get both kept and discarded items, take advantage of set operations:
N = 3
keep = {}
discard = {}
for k,v in d.items():
keep[k] = set(random.sample(list(v), min(len(v), N)))
discard[k] = v.difference(keep[k])
print(keep)
# {'key': {1, 2, 4}, 'key1': {9, 2, 4}, 'key2': {2, 4, 6}, 'key3': {9, 3, 5}}
print(discard)
# {'key': {8, 3}, 'key1': set(), 'key2': set(), 'key3': set()}
Inplace update of real_vertrices_copy:
import random
real_vertrices_copy = {'key': {1,3,8,4,2},
'key1': {9,4,2,4},
'key2': {6,4,2},
'key3': {9,3,5,3,5}
}
for k, v in real_vertrices_copy.items():
while len(v) > 3:
v.remove(random.choice(tuple(v)))
I have a list of nested dictionaries:
[{'a': 1,
'b': 'string',
'c': [{'key1': 80,
'key2': 'string',
'key3': 4033},
{'key1': 324,
'key2': 'string',
'key3': 4034,
'key4': 1}]},
{'a': 1,
'b': 'string',
'c': [{'key1': 80,
'key2': 'string',
'key3': 4033},
{'key1': 324,
'key2': 'string',
'key3': 4034,
'key4': 1,
'key5': 2}]}]
Please not that the values of key c is a list of dictionaries again.
Now I want to filter out from this list all dictionaries with key c, that do not contain key1, key2, key3 & key4.
I thought of looping first over the first, second, and so on dict in the list, and then looping over the nested dicts that have c as a key. Then, if the dict inside c does not meet my requirement, I delete it.
Therefore my code would be:
for j in range(len(mydict)):
for i in range(len(mydict[j]["c"])):
if not all (k in mydict[j]["c"][i] for k in ("key1", "key2", "key3", "key4")):
del(mydict[j]["c"][i])
But I am getting a IndexError: list index out of range error. Where is my mistake?
My desired output would be:
[{'a': 1,
'b': 'string',
'c': [{'key1': 324,
'key2': 'string',
'key3': 4034,
'key4': 1}]},
{'a': 1,
'b': 'string',
'c': [{'key1': 324,
'key2': 'string',
'key3': 4034,
'key4': 1,
'key5': 2}]}]
The problem is that with for i in range(len(mydict[j]["c"])): you are iterating the lists in the dict while at the same time removing from those lists. Instead, you can replace the inner loop with a list comprehension:
for d in mydict:
d['c'] = [d2 for d2 in d['c']
if all(k in d2 for k in ("key1", "key2", "key3", "key4"))]
Just to have another option:
keep = {'key1', 'key2', 'key3', 'key4'}
for h in mydict:
h['c'] = [ e for e in h['c'] if len(keep - set(e.keys())) == 0 ]
If you wanted another perspective on it:
def remove_keys(mydict):
mydict2 = mydict
keys = ['key1', 'key2', 'key3', 'key4']
for xIndex, x in enumerate(mydict):
for yIndex, y in enumerate(x['c']):
if not all(key in y.keys() for key in keys):
del mydict2[xIndex]['c'][yIndex]
return mydict2
Returns a new dictionary with the modifications.
I'm on Python 2.7 and have looked at several solutions here which works if you know how many dictionaries you are merging, but I could have anything between 2 to 5.
I have a loop which generates a dict with the same keys but different values. I want to add the new values to the previous.
Such as:
for num in numbers:
dict = (function which outputs a dictionary)
[merge with dictionary from previous run of the loop]
So if:
dict (from loop one) = {'key1': 1,
'key2': 2,
'key3': 3}
and
dict (from loop two) = {'key1': 4,
'key2': 5,
'key3': 6}
The resultant dict would be:
dict = {'key1': [1,4]
'key2': [2,5],
'key3': [3,6]}
Use a defaultdict:
In [18]: def gen_dictionaries():
...: yield {'key1': 1, 'key2': 2, 'key3': 3}
...: yield {'key1': 4, 'key2': 5, 'key3': 6}
...:
In [19]: from collections import defaultdict
In [20]: final = defaultdict(list)
In [21]: for d in gen_dictionaries():
...: for k, v in d.iteritems():
...: final[k].append(v)
...:
In [22]: final
Out[22]: defaultdict(list, {'key1': [1, 4], 'key2': [2, 5], 'key3': [3, 6]})
One way to achieve this in a generic way is via using set to find the union of keys of both the dicts and then use a dictionary comprehension to get the desired dict as:
>>> dict_list = [d1, d2] # list of all the dictionaries which are to be joined
# set of the keys present in all the dicts;
>>> common_keys = set(dict_list[0]).union(*dict_list[1:])
>>> {k: [d[k] for d in dict_list if k in d] for k in common_keys}
{'key3': [3, 6], 'key2': [2, 5], 'key1': [1, 4]}
where d1 and d2 are:
d1 = {'key1': 1,
'key2': 2,
'key3': 3}
d2 = {'key1': 4,
'key2': 5,
'key3': 6}
Explanation: Here, dict_list is the list of all the dict objects you want to combine. Then I am creating the common_keys set of the keys in all the dict object. Finally I am creating a new dictionary via using dictionary comprehension (with nested list comprehension with filter).
Based on comment from OP, since all the dicts hold the same keys, we can skip the usage of set. Hence the code could be written as:
>>> dict_list = [d1, d2]
>>> {k: [d[k] for d in dict_list] for k in dict_list[0]}
{'key3': [3, 6], 'key2': [2, 5], 'key1': [1, 4]}
dict1 = {'m': 2, 'n':4, 'o':7, 'p':90}
dict2 = {'m': 1, 'n': 3}
dict3 = {}
for key,value in dict1.iteritems():
if key not in dict3:
dict3[key] = list()
dict3[key].append(value)
for key,value in dict2.iteritems():
if key not in dict3:
dict3[key] = list()
dict3[key].append(value)
print(dict3)
The output looks like this :
{'p': [90], 'm': [2, 1], 'o': [7], 'n': [4, 3]}
What is the best way to split a dictionary in half?
d = {'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4, 'key5': 5}
I'm looking to do this:
d1 = {'key1': 1, 'key2': 2, 'key3': 3}
d2 = {'key4': 4, 'key5': 5}
It does not matter which keys/values go into each dictionary. I am simply looking for the simplest way to divide a dictionary into two.
This would work, although I didn't test edge-cases:
>>> d = {'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4, 'key5': 5}
>>> d1 = dict(d.items()[len(d)/2:])
>>> d2 = dict(d.items()[:len(d)/2])
>>> print d1
{'key1': 1, 'key5': 5, 'key4': 4}
>>> print d2
{'key3': 3, 'key2': 2}
In python3:
d = {'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4, 'key5': 5}
d1 = dict(list(d.items())[len(d)//2:])
d2 = dict(list(d.items())[:len(d)//2])
Also note that order of items is not guaranteed
Here's a way to do it using an iterator over the items in the dictionary and itertools.islice:
import itertools
def splitDict(d):
n = len(d) // 2 # length of smaller half
i = iter(d.items()) # alternatively, i = d.iteritems() works in Python 2
d1 = dict(itertools.islice(i, n)) # grab first n items
d2 = dict(i) # grab the rest
return d1, d2
d1 = {key: value for i, (key, value) in enumerate(d.viewitems()) if i % 2 == 0}
d2 = {key: value for i, (key, value) in enumerate(d.viewitems()) if i % 2 == 1}
If you use python +3.3, and want your splitted dictionaries to be the same across different python invocations, do not use .items, since the hash-values of the keys, which determines the order of .items() will change between python invocations.
See Hash randomization
The answer by jone did not work for me. I had to cast to a list before I could index the result of the .items() call. (I am running Python 3.6 in the example)
d = {'one':1, 'two':2, 'three':3, 'four':4, 'five':5}
split_idx = 3
d1 = dict(list(d.items())[:split_idx])
d2 = dict(list(d.items())[split_idx:])
"""
output:
d1
{'one': 1, 'three': 3, 'two': 2}
d2
{'five': 5, 'four': 4}
"""
Note the dicts are not necessarily stored in the order of creation so the indexes may be mixed up.
Here is the function which can be used to split a dictionary to any divisions.
import math
def linch_dict_divider(raw_dict, num):
list_result = []
len_raw_dict = len(raw_dict)
if len_raw_dict > num:
base_num = len_raw_dict / num
addr_num = len_raw_dict % num
for i in range(num):
this_dict = dict()
keys = list()
if addr_num > 0:
keys = raw_dict.keys()[:base_num + 1]
addr_num -= 1
else:
keys = raw_dict.keys()[:base_num]
for key in keys:
this_dict[key] = raw_dict[key]
del raw_dict[key]
list_result.append(this_dict)
else:
for d in raw_dict:
this_dict = dict()
this_dict[d] = raw_dict[d]
list_result.append(this_dict)
return list_result
myDict = {'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4, 'key5': 5}
print myDict
myList = linch_dict_divider(myDict, 2)
print myList
We can do this efficiently with itertools.zip_longest() (note this is itertools.izip_longest() in 2.x):
from itertools import zip_longest
d = {'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4, 'key5': 5}
items1, items2 = zip(*zip_longest(*[iter(d.items())]*2))
d1 = dict(item for item in items1 if item is not None)
d2 = dict(item for item in items2 if item is not None)
Which gives us:
>>> d1
{'key3': 3, 'key1': 1, 'key4': 4}
>>> d2
{'key2': 2, 'key5': 5}
Here's a function that I use in Python 3.8 that can split a dict into a list containing the desired number of parts. If you specify more parts than elements, you'll get some empty dicts in the resulting list.
def split_dict(input_dict: dict, num_parts: int) -> list:
list_len: int = len(input_dict)
return [dict(list(input_dict.items())[i * list_len // num_parts:(i + 1) * list_len // num_parts])
for i in range(num_parts)]
Output:
>>> d = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
>>> split_dict(d, 2)
[{'a': 1, 'b': 2}, {'c': 3, 'd': 4, 'e': 5}]
>>> split_dict(d, 3)
[{'a': 1}, {'b': 2, 'c': 3}, {'d': 4, 'e': 5}]
>>> split_dict(d, 7)
[{}, {'a': 1}, {'b': 2}, {}, {'c': 3}, {'d': 4}, {'e': 5}]
If you used numpy, then you could do this :
def divide_dict(dictionary, chunk_size):
'''
Divide one dictionary into several dictionaries
Return a list, each item is a dictionary
'''
import numpy, collections
count_ar = numpy.linspace(0, len(dictionary), chunk_size+1, dtype= int)
group_lst = []
temp_dict = collections.defaultdict(lambda : None)
i = 1
for key, value in dictionary.items():
temp_dict[key] = value
if i in count_ar:
group_lst.append(temp_dict)
temp_dict = collections.defaultdict(lambda : None)
i += 1
return group_lst
What i need to do is to convert something like this
{'key1': [1, 2, 3], 'key2': [4, 5, 6]}
into
[{'key1': 1, 'key2': 4}, {'key1': 2, 'key2': 5}, {'key1': 3, 'key2': 6}]
The length of the value lists can vary!
What's the quickest way to do this (preferably without for loops)?
Works for any number of keys
>>> map(dict, zip(*[[(k, v) for v in value] for k, value in d.items()]))
[{'key2': 4, 'key1': 1}, {'key2': 5, 'key1': 2}, {'key2': 6, 'key1': 3}]
For example:
d = {'key3': [7, 8, 9], 'key2': [4, 5, 6], 'key1': [1, 2, 3]}
>>> map(dict, zip(*[[(k, v) for v in value] for k, value in d.items()]))
[{'key3': 7, 'key2': 4, 'key1': 1}, {'key3': 8, 'key2': 5, 'key1': 2}, {'key3': 9, 'key2': 6, 'key1': 3}]
A general solution that works on any number of values or keys: (python2.6)
>>> from itertools import izip_longest
>>> d = {'key2': [3, 4, 5, 6], 'key1': [1, 2]}
>>> map(lambda a: dict(filter(None, a)), izip_longest(*[[(k, v) for v in value] for k, value in d.items()]))
[{'key2': 3, 'key1': 1}, {'key2': 4, 'key1': 2}, {'key2': 5}, {'key2': 6}]
And if you don't have python2.6:
>>> d = {'key2': [3, 4, 5, 6], 'key1': [1, 2]}
>>> map(lambda a: dict(filter(None, a)), map(None, *[[(k, v) for v in value] for k, value in d.items()]))
[{'key2': 3, 'key1': 1}, {'key2': 4, 'key1': 2}, {'key2': 5}, {'key2': 6}]
Assuming the number of keys, and values per key, are both arbitrary and a priori unknown, it's simplest to get the result with for loops, of course:
itit = thedict.iteritems()
k, vs = next(itit)
result = [{k: v} for v in vs]
for k, vs in itit:
for d, v in itertools.izip(result, vs):
d[k] = v
It can be collapsed, but I'm dubious about the performance implications of doing so (if the data structures involved are so huge as to warrant performance optimization, building any extra auxiliary structure in memory beyond what's strictly required may turn out costly -- this simple approach of mine is being especially careful to avoid any such intermediate structures).
Edit: another alternative, particularly interesting if the overall data structures are huge but in some use cases you may only need "bits and pieces" of the "transformed" structure, is to build a class that provides the interface you require, but does so "on the fly", rather than in a "big bang", "once and for all" transformation (this might be especially helpful if the original structure can change and the transformed one needs to reflect the present state of the original, etc, etc).
Of course, for such a purpose it's very helpful to identify exactly what features of a "list of dictionaries" your downstream code would use. Suppose for example that all you need is actually "read-only" indexing (not changing, iterating, slicing, sorting, ...): X[x] must return a dictionary in which each key k maps to a value such that (caling O the original dictionary of lists) X[x][k] is O[k][x]. Then:
class Wrap1(object):
def __init__(self, O):
self.O = O
def __getitem__(self, x):
return dict((k, vs[x]) for k, vs in self.O.iteritems())
If you don't in fact need the wrapped structure to track modifications to the original one, then __getitem__ might well also "cache" the dict it's returning:
class Wrap2(object):
def __init__(self, O):
self.O = O
self.cache = {}
def __getitem__(self, x):
r = self.cache.get(x)
if r is None:
r = self.cache[x] = dict((k, vs[x]) for k, vs in self.O.iteritems())
return r
Note that this approach may end up with some duplication in the cache, e.g., if O's lists have 7 items each, the cache at x==6 and x==-1 may end up with two equal dicts; if that's a problem you can, for example, normalize negative xs in __getitem__ by adding len(self.O) to them before proceeding.
If you also need iteration, as well as this simple indexing, that's not too hard: just add an __iter__ method, easily implemented e.g. as a simple generator...:
def __iter__(self, x):
for i in xrange(len(self.O)):
yield self[i]
And so forth, incrementally, if and as you need more and more of a list's functionality (at worst, once you have implemented this __iter__, you can build self.L = list(self) -- reverting to the "big bang" approach -- and, for any further request, punt to self.L... but you'll have to make a special metaclass if you want to take that approach for special methods as well, or use some subtler trick such as self.__class__ = list; self[:] = self.L followed by appropriate dels;-).
If there are always two keys you can use:
[{'key1':a, 'key2':b} for (a,b) in zip(d['key1'], d['key2'])]
>>> a = {'key1': [1, 2, 3], 'key2': [4, 5, 6]}
>>> [dict((key, a[key][i]) for key in a.keys()) for i in range(len(a.values()[0]))]
[{'key2': 4, 'key1': 1}, {'key2': 5, 'key1': 2}, {'key2': 6, 'key1': 3}]
d = {'key1': [1, 2, 3], 'key2': [4, 5, 6]}
keys = d.keys()
vals = zip(*[d[k] for k in keys])
l = [dict(zip(keys, v)) for v in vals]
print l
produces
[{'key2': 4, 'key1': 1}, {'key2': 5, 'key1': 2}, {'key2': 6, 'key1': 3}]
Without for loop, Internal process of map is iterating actually, just without the keyword for
>>> x={'key1': [1, 2, 3], 'key2': [4, 5, 6]}
>>> map(lambda x,y:{'key1':x,'key2':y},x['key1'],x['key2'])
[{'key2': 4, 'key1': 1}, {'key2': 5, 'key1': 2}, {'key2': 6, 'key1': 3}]
How about?
d = {'key1': [1, 2, 3], 'key2': [4, 5, 6]}
[dict(zip(d.keys(),i)) for i in zip(*d.values())]
Returns:
[{'key1': 1, 'key2': 4}, {'key1': 2, 'key2': 5}, {'key1': 3, 'key2': 6}]
list(map( dict, zip(*([(key, val) for val in data[key]] for key in data.keys()))))