I have a dictionary of "259136 keys" and each of those keys, have 1 or more than one values.
My objective is "to find keys that have at least one value common with another key in the list of keys?"
I have tried different ways to deal with this problem but I was looking for a faster solution. I tried
for each key compare with the 259135 keys to check the above condition
reversing the dictionary from key value to value key, so now the value becomes key and this way I will have two dictionaries and I can go to first one and based on the values in the first one pull out all the values from the second one.
Use a dict of sets:
d={ 'k1': [1,2,3],
'k2': [2],
'k3': [10],
'k4': [3,2]
}
com_keys={}
for k, v in d.items():
for e in v:
com_keys.setdefault(e, set()).add(k)
print com_keys
# {1: set(['k1']), 10: set(['k3']), 3: set(['k1', 'k4']), 2: set(['k2', 'k1', 'k4'])}
Then if you only want the ones that have more than one key in common, just filter with a dict comprehension (or the like for older Pythons):
>>> {k:v for k,v in com_keys.items() if len(v)>1 }
{2: set(['k2', 'k1', 'k4']), 3: set(['k1', 'k4'])}
It get a little more challenging if your dict is a non-homogenous combination of containers that support iteration (lists, tuples, etc) with 'single items' that either do not support iteration (ints, floats) or things that you do not want to iterate with a for loop (strings, unicode, other dicts, etc)
For example, assume you have a combination of lists and 'single items' that are ints and strings:
import collections
d={ 'k1': [1,2,3],
'k2': 2,
'k3': [10],
'k4': [3,2],
'k5': 'string',
'k6': ['string',2]
}
com_keys={}
for k, v in d.items():
if not isinstance(v, basestring) and isinstance(v, collections.Iterable):
for e in v:
com_keys.setdefault(e, set()).add(k)
else:
com_keys.setdefault(v, set()).add(k)
print com_keys
# {1: set(['k1']), 10: set(['k3']), 3: set(['k1', 'k4']), 2: set(['k2', 'k1', 'k6', 'k4']), 'string': set(['k6', 'k5'])}
print {k:v for k,v in com_keys.items() if len(v)>1 }
# {2: set(['k2', 'k1', 'k6', 'k4']), 3: set(['k1', 'k4']), 'string': set(['k6', 'k5'])}
Related
I have a list of keys:
l_keys = ['a', 'c', 'd']
And I have a list of dictionary:
l_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a':4, 'd':5}]
The result I want to get is:
[{'a': 1, 'c': 3}, {'a': 4, 'd': 5}]
.
I can achieve this result in the following way
[{k: d[key] for k in l_keys if k in l_dict} for d in l_dict]
.
Explain:
I actually go through every object in l_dict and then I go through every key in l_keys and check if that key is in the current object and if so I retrieve it and its value
My question is if there is a better, professional and faster way in terms of time complexity to do that.
Firstly, your list comprehension should be: [{k: d[k] for k in l_keys if k in d} for d in l_dict]
If you know that len(l_keys) will usually be smaller than the dicts in l_dict, your way is the most efficient. Otherwise, it would be better to check whether each key in the dict is in l_keys: [{k: d[k] for k in d if k in l_keys} for d in l_dict] l_set = set(l_keys): [{k: d[k] for k in d if k in l_set} for d in l_dict]
This page might be helpful when it comes to time complexity: https://wiki.python.org/moin/TimeComplexity#dict
I have source dictionary, please refer sample below. It has deep level nested sub-dictionaries and lists, and I need to remove all keys where the value is "nan".
data = {"col1":"val1","col2":"val2","col3":"val3","col4":"val3","list1":[{"l1":"v1","l2":"nan"},{"K1":"Kv1","K2":"nan"},{"M1":"Mv1","M2":"nan","sublist1":[{"SL1":"SV1","SL2":"nan"}]}],"list2":[{"l1":"v1","l2":"nan"},{"K1":"Kv1","K2":"nan"},{"M1":"Mv1","M2":"nan","sublist2":[{"SL1":"SV1","SL2":"nan"}]}]}
I tried following code by creating a function but it is not working as expected:
def cleanNullTerms(d):
clean = {}
for k, v in d.items():
if isinstance(v, list):
for values in v:
nested = cleanNullTerms(values)
if values =='nan':
clean[k] = nested
elif v is not 'nan':
clean[k] = v
return clean
You've didn't describe exactly what's wrong, but from inspection your function clearly looks wrong to me because it doesn't check for nested dicts. You also haven't indicated what your expected output would be, but the following looks correct to me.
from pprint import pprint, pp
def cleanNullTerms(d):
clean = {}
for k, v in d.items():
if isinstance(v, dict):
clean[k] = cleanNullTerms(v)
elif isinstance(v, list):
clean[k] = [cleanNullTerms(values) for values in v]
elif v != 'nan':
clean[k] = v
return clean
data = {'col1': 'val1',
'col2': 'val2',
'col3': 'val3',
'col4': 'val3',
'list1': [{'l1': 'v1', 'l2': 'nan'},
{'K1': 'Kv1', 'K2': 'nan'},
{'M1': 'Mv1',
'M2': 'nan',
'sublist1': [{'SL1': 'SV1', 'SL2': 'nan'}]}],
'list2': [{'l1': 'v1', 'l2': 'nan'},
{'K1': 'Kv1', 'K2': 'nan'},
{'M1': 'Mv1',
'M2': 'nan',
'sublist2': [{'SL1': 'SV1', 'SL2': 'nan'}]}]}
data = cleanNullTerms(data)
pprint(data, sort_dicts=False)
Results printed:
{'col1': 'val1',
'col2': 'val2',
'col3': 'val3',
'col4': 'val3',
'list1': [{'l1': 'v1'},
{'K1': 'Kv1'},
{'M1': 'Mv1', 'sublist1': [{'SL1': 'SV1'}]}],
'list2': [{'l1': 'v1'},
{'K1': 'Kv1'},
{'M1': 'Mv1', 'sublist2': [{'SL1': 'SV1'}]}]}
Your code put values of list to the place that the list had. If there were many values every value overwrote last one so only last one got saved. And it did not go recusively into dictionaries. It still only removes "nan"s if they are nested with dicts and lists. For example it does not go into sets. If you want it to go into sets, write a comment and I can add this.
def cleanNullTerms(d):
clean={}
for k, v in d.items():
if isinstance(v, list):
v2=[]
for value0 in v:
nested=cleanNullTerms(value0)
if value0!='nan':
v2.append(nested)
clean[k]=v2
elif isinstance(v, dict):
clean[k]=cleanNullTerms(v)
elif v!='nan':
clean[k]=v
return clean
This question already has answers here:
is it possible to reverse a dictionary in python using dictionary comprehension
(5 answers)
Closed 2 years ago.
While I've been improving my Python skills I have one question.
My code is below:
# def invertDictionary(dict):
# new_dict = {}
# for key, value in dict.items():
# if value in new_dict:
# new_dict[value].append(key)
# else:
# new_dict[value]=[key]
# return new_dict
def invertDictionary(dict):
new_dict = {value:([key] if value else [key]) for key, value in dict.items()}
return new_dict;
invertDictionary({'a':3, 'b':3, 'c':3})
I am trying to get output like {3:['a','b','c']}. I have achieved that using a normal for-loop; I just want to know how to get these results using a Dictionary Comprehension. I tried but in append it's getting an error. Please let me know how to achieve this.
Thanks in Advance!
You missed that you also need a list comprehension to build the list.
Iterate over the values in the dict, and build the needed list of keys for each one.
Note that this is a quadratic process, whereas the canonical (and more readable) for loop is linear.
d = {'a':3, 'b':3, 'c':3, 'e':4, 'f':4, 'g':0}
inv_dict = {v: [key for key, val in d.items() if val == v]
for v in set(d.values())}
result:
{0: ['g'],
3: ['a', 'b', 'c'],
4: ['e', 'f']
}
Will this do?
while your original version with a regular for loop is the best solution for this, here is a variation on #Prune answer that doesn't goes over the dict multiple times
>>> import itertools
>>> d = {'a':3, 'b':3, 'c':3, 'e':4, 'f':4, 'g':0}
>>> {group_key:[k for k,_ in dict_items]
for group_key,dict_items in itertools.groupby(
sorted(d.items(),key=lambda x:x[-1]),
key=lambda x:x[-1]
)
}
{0: ['g'], 3: ['a', 'b', 'c'], 4: ['e', 'f']}
>>>
first we sorted the items of the dict by value with a key function to sorted using a lambda function to extract the value part of the item tuple, then we use the groupby to group those with the same value together with the same key function and finally with a list comprehension extract just the key
--
as noted by Kelly, we can use the get method from the dict to get the value to make it shorter and use the fact that iteration over a dict give you its keys
>>> {k: list(g) for k, g in itertools.groupby(sorted(d, key=d.get), d.get)}
{0: ['g'], 3: ['a', 'b', 'c'], 4: ['e', 'f']}
>>>
You could use a defalutdict and the append method.
from collections import defaultdict
dict1 = {'a': 3, 'b': 3, 'c': 3}
dict2 = defaultdict(list)
{dict2[v].append(k) for k, v in dict1.items()}
dict2
>>> defaultdict(list, {3: ['a', 'b', 'c']})
I would like to know how if there exists any python function to merge two dictionary and combine all values that have a common key.
I have found function to append two dict, to merge two dict but not to combine its values.
Example:
D1 = [{k1: v01}, {k3: v03}, {k4: v04},}],
D2 = [{k1: v11}, {k2: v12}, {k4: v14},}],
this should be the expected result:
D3 = [
{k1: [v01, v11]},
{k2: [ v12]},
{K3: [v03 ]},
{k4: [v04, v14]},
]
There is no built-in function for this but you can use a defaultdict for this:
from collections import defaultdict
d = defaultdict(list)
for other in [d1, d1]:
for k, v in other.items():
d[k].append(v)
A solution, without importing anything:
# First initialize data, done correctly here.
D1 = [{'k1': 'v01'}, {'k3': 'v03'}, {'k4': 'v04'}]
D2 = [{'k1': 'v11'}, {'k2': 'v12'}, {'k4': 'v14'}]
# Get all unique keys
keys = {k for d in [*D1, *D2] for k in d}
# Initialize an empty dict
D3 = {x:[] for x in keys}
# sort to maintain order
D3 = dict(sorted(D3.items()))
#Iterate and extend
for x in [*D1, *D2]:
for k,v in x.items():
D3[k].append(v)
# NOTE: I do not recommend you convert a dictionary into a list of records.
# Nonetheless, here is how it would be done.
# To convert to a list
D3_list = [{k:v} for k,v in D3.items()]
print(D3_list)
# [{'k1': ['v01', 'v11']},
# {'k2': ['v12']},
# {'k3': ['v03']},
# {'k4': ['v04', 'v14']}]
If you meant to use actual dicts, instead of lists of dicts, this is easier.
D1 = dict(k1=1, k3=3, k4=4)
D2 = dict(k1=11, k2=12, k4=14)
There isn't a simple built-in function to do this, but the setdefault method is close.
It tries to get the given key, but creates it if it doesn't exist.
D3 = {}
for k, v in D1.items() | D2.items():
D3.setdefault(k, set()).add(v)
And the result.
{'k4': {4, 14}, 'k1': {1, 11}, 'k3': {3}, 'k2': {12}}
This all assumes the order doesn't matter, just combining sets.
A more generic way to merge dicts together may look like this.
(To answer a similar SO question)
def merge(combiner, dicts):
new_dict = {}
for d in dicts:
for k, v in d.items():
if k in new_dict:
new_dict[k] = combiner(new_dict[k], v)
else:
new_dict[k] = v
return new_dict
x = {'a': 'A', 'b': 'B'}
y = {'b': 'B', 'c': 'C'}
z = {'a': 'A', 'd': 'D'}
merge_dicts(combiner= lambda x, y: f'{x} AND {y}', dicts=(x,y,z))
# {'a': 'A AND A', 'b': 'B AND B', 'c': 'C', 'd': 'D'}
Say I have the following dictionary.
>> sample_dict = {"1": ['a','b','c'], "2": ['d','e','f'], "3": ['g','h','a']}
I would like to find a way that would look at the values of each of the keys and return whether or not the value lists have the a duplicate variable inside.
For example it would output:
>> [["1","3"] , ['a']]
I've looked at a few of the posts here and tried to use and/or change them to accomplish this, however none of what I have found has worked as intended. They would work if it was as follows:
>> sample_dict = {"1": ['a','b','c'], "2": ['d','e','f'], "3": ['a','b','c']}
but not if only a single value within the list was the same.
You could use another dictionary to map the values to the lists of corresponding keys. Then just select the values that map to more than one key, e.g.:
from collections import defaultdict
sample_dict = {'1': ['a','b','c'], '2': ['d','e','f'], '3': ['g','h','a']}
d = defaultdict(list) # automatically initialize every value to a list()
for k, v in sample_dict.items():
for x in v:
d[x].append(k)
for k, v in d.items():
if len(v) > 1:
print([v, k])
Output:
[['1', '3'], 'a']
If the list elements are hashable, you can use .setdefault to build an inverse mapping like so:
>>> sample_dict = {"1": ['a','b','c'], "2": ['d','e','f'], "3": ['g','h','a']}
>>> aux = {}
>>> for k, v in sample_dict.items():
... for i in v:
... aux.setdefault(i, []).append(k)
...
>>> [[v, k] for k, v in aux.items() if len(v) > 1]
[[['1', '3'], 'a']]
Dictionaries map from keys to values, not from values to keys. But you can write a function for one-off calculations. This will incur O(n) time complexity and is not recommended for larger dictionaries:
def find_keys(d, val):
return [k for k, v in d.items() if val in v]
res = find_keys(sample_dict, 'a') # ['1', '3']
If you do this often, I recommend you "invert" your dictionary via collections.defaultdict:
from collections import defaultdict
dd = defaultdict(list)
for k, v in sample_dict.items():
for w in v:
dd[w].append(k)
print(dd)
defaultdict(<class 'list'>, {'a': ['1', '3'], 'b': ['1'], 'c': ['1'], 'd': ['2'],
'e': ['2'], 'f': ['2'], 'g': ['3'], 'h': ['3']})
This costs O(n) for the inversion, as well as additional memory, but now allows you to access the keys associated with an input value in O(1) time, e.g. dd['a'] will return ['1', '3'].
You can use defaultdict from collections module to do this
for example,
from collections import defaultdict
sample_dict = {"1": ['a','b','c'], "2": ['d','e','f'], "3": ['g','h','a']}
d = defaultdict(list)
for keys, vals in sample_dict.items():
for v in vals:
d[v].append(keys)
print(d)
d will return a dict, where the keys will be the values that are repeated and values will be the list in which they were repeated in
The output of above code is defaultdict(list,{'a': ['1', '3'],'b': ['1'],'c': ['1'],'d': ['2'],'e': ['2'],'f': ['2'],'g': ['3'],'h': ['3']})
Although it IS possible to get form in which you desired the output to be in, but it is not generally recommended because we are trying to get what character get repeated in which list, that feels like a job of a dictionary