Group dict elements based on a specified dict value - python

I have this data:
data =
{
"a":{
"00066554466":{
"Id":650,
"Passwd":"e1c2a06545de9164d7e87cd98bed57c5",
"zone":"Europe/Zurich"
},
"8745212300":{
"Id":400,
"Passwd":"ecb95502daace7f46bf12b484d086e5b",
"zone":"Europe/Zurich"
},
"8745212301":{
"Id":401,
"Passwd":"ecb95502daace7f46bf12b484d086e5b",
"zone":"Europe/Zurich"
},
"8745212302":{
"DevId":402,
"Passwd":"ecb95502daace7f46bf12b484d086e5b",
"zone":"Europe/Zurich"
}
}
}
I would like to group keys with same Passwd. So result should be like the following.
{
"e1c2a06545de9164d7e87cd98bed57c5":[
"00066554466"
],
"ecb95502daace7f46bf12b484d086e5b":[
"8745212300",
"8745212301",
"8745212302"
]
}
I tried with itertools.groupby and with for k,v in xxx, but the result is never what I need.

itertools.groupby works well when then data is already sorted with values to be grouped in a successive order, which might not always be the case with your data.
Rather use dict.setdefault and a nested loop:
out = {}
for d1 in data.values():
for k, d2 in d1.items():
out.setdefault(d2['Passwd'], []).append(k)
print(out)
Variant with a defaultdict:
from collections import defaultdict
out = defaultdict(list)
for d1 in data.values():
for k, d2 in d1.items():
out[d2['Passwd']].append(k)
print(dict(out))
Output:
{'e1c2a06545de9164d7e87cd98bed57c5': ['00066554466'],
'ecb95502daace7f46bf12b484d086e5b': ['8745212300', '8745212301', '8745212302']}

One solution, but probably not the pythonic is just to do:
passwd_group = {}
for k, val in data["a"]:
if val["Passwd"] not in passwd_group:
passwd_group[val["Passwd"]] = []
passwd_group.append(k)

This may not be ideal but got it working.
new_dict = {}
for L in data.keys():
x = data[L]
for M in x.keys():
y = x[M]
for N in y.keys():
if N == "Passwd":
new_list = new_dict.get(y[N], [])
new_list.append(M)
new_dict[y[N]] = new_list
print(new_dict)

Related

Transform dictionary to map values to list of keys

For example I have a dictionary like this:
my_dict = {
'name_1': 'method_name_x',
'name_2': 'method_name_x',
'name_3': 'method_name_y',
}
(keys and values of the dictionary are simply strings)
I want to transform this dictionary so that all values will be mapped to a list of keys which have these value.
Example result:
my_transformed_dict = {
'method_name_x': ['name_1', 'name_2'],
'method_name_y': ['name_3'],
}
I could do this by the following code:
my_transformed_dict = dict.fromkeys(my_dict.values(), [])
for k, v in my_dict.items():
my_transformed_dict[v].append(k)
But this will end up addind every key to the values somehow.
I also thought of using dict.setdefault(), like this:
my_transformed_dict = dict()
for k, v in my_dict:
my_transformed_dict.setdefault(v, []).append(k)
This works as indentend, but:
What would be best practice to solve this?
Is there a simpler way to solve this (maybe using a library)? Or just doing the code as a readable one-liner?
You can use itertools.groupby, for example:
from itertools import groupby
my_dict = {
'name_1': 'method_name_x',
'name_2': 'method_name_x',
'name_3': 'method_name_y',
}
print({k: list(v) for k, v in groupby(sorted(my_dict, key=lambda k: my_dict[k]), key=lambda k: my_dict[k])})
Output:
{'method_name_x': ['name_1', 'name_2'], 'method_name_y': ['name_3']}

how to change tuple key into multilevel dict

I have a dictionary that looks like this:
d = {key1 : {(key2,key3) : value}, ...}
so it is a dictionary of dictionaries and in the inside dict the keys are tuples.
I would like to get a triple nested dict:
{key1 : {key2 : {key3 : value}, ...}
I know how to do it with 2 loops and a condition:
new_d = {}
for key1, inside_dict in d.items():
new_d[key1] = {}
for (key2,key3), value in inside_dict.items():
if key2 in new_d[key1].keys():
new_d[key1][key2][key3] = value
else:
new_d[key1][key2] = {key3 : value}
Edit: key2 values are not guaranteed to be unique. This is why I added the condition
It feels very unpythonic to me.
Is there a faster and/or shorter way to do this?
You could use the common trick for nesting dicts arbitrarily, using collections.defaultdict:
from collections import defaultdict
tree = lambda: defaultdict(tree)
new_d = tree()
for k1, dct in d.items():
for (k2, k3), val in dct.items():
new_d[k1][k2][k3] = val
If I understand the problem correctly, for this case you can wrap all the looping up in a dict comprehension. This assumes that your data is unique:
data = {"key1": {("key2", "key3"): "val"}}
{k: {keys[0]: {keys[1]: val}} for k,v in data.items() for keys, val in v.items()}

Combine python dictionaries that share values and keys

I am doing some entity matching based on string edit distance and my results are a dictionary with keys (query string) and values [list of similar strings] based on some scoring criteria.
for example:
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': ['benyamin'],
'benyamin': ['benjamin'],
'carl': ['karl'],
'karl': ['carl'],
}
Each value also has a corresponding dictionary item, for which it is the key (e.g. 'carl' and 'karl').
I need to combine the elements that have shared values. Choosing one value as the new key (lets say the longest string). In the above example I would hope to get:
results = {
'benjamin': ['ben', 'benj', 'benyamin', 'beny', 'benjamin', 'benyamin'],
'carl': ['carl','karl']
}
I have tried iterating through the dictionary using the keys, but I can't wrap my head around how to iterate and compare through each dictionary item and its list of values (or single value).
This is one solution using collections.defaultdict and sets.
The desired output is very similar to what you have, and can be easily manipulated to align.
from collections import defaultdict
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': 'benyamin',
'benyamin': 'benjamin',
'carl': 'karl',
'karl': 'carl',
}
d = defaultdict(set)
for i, (k, v) in enumerate(results.items()):
w = {k} | (set(v) if isinstance(v, list) else {v})
for m, n in d.items():
if not n.isdisjoint(w):
d[m].update(w)
break
else:
d[i] = w
result = {max(v, key=len): v for k, v in d.items()}
# {'benjamin': {'ben', 'benj', 'benjamin', 'beny', 'benyamin'},
# 'carl': {'carl', 'karl'}}
Credit to #IMCoins for the idea of manipulating v to w in second loop.
Explanation
There are 3 main steps:
Convert values into a consistent set format, including keys and values from original dictionary.
Cycle through this dictionary and add values to a new dictionary. If there is an intersection with some key [i.e. sets are not disjoint], then use that key. Otherwise, add to new key determined via enumeration.
Create result dictionary in a final transformation by mapping max length key to values.
EDIT : Even though performance was not the question here, I took the liberty to perform some tests between jpp's answer, and mine... here is the full script. My script performs the tests in 17.79 seconds, and his in 23.5 seconds.
import timeit
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': ['benyamin'],
'benyamin': ['benjamin'],
'carl': ['karl'],
'karl': ['carl'],
}
def imcoins(result):
new_dict = {}
# .items() for python3x
for k, v in results.iteritems():
flag = False
# Checking if key exists...
if k not in new_dict.keys():
# But then, we also need to check its values.
for item in v:
if item in new_dict.keys():
# If we update, set the flag to True, so we don't create a new value.
new_dict[item].update(v)
flag = True
if flag == False:
new_dict[k] = set(v)
# Now, to sort our newly created dict...
sorted_dict = {}
for k, v in new_dict.iteritems():
max_string = max(v)
if len(max_string) > len(k):
sorted_dict[max(v, key=len)] = set(v)
else:
sorted_dict[k] = v
return sorted_dict
def jpp(result):
from collections import defaultdict
res = {i: {k} | (set(v) if isinstance(v, list) else {v}) \
for i, (k, v) in enumerate(results.items())}
d = defaultdict(set)
for i, (k, v) in enumerate(res.items()):
for m, n in d.items():
if n & v:
d[m].update(v)
break
else:
d[i] = v
result = {max(v, key=len): v for k, v in d.items()}
return result
iterations = 1000000
time1 = timeit.timeit(stmt='imcoins(results)', setup='from __main__ import imcoins, results', number=iterations)
time2 = timeit.timeit(stmt='jpp(results)', setup='from __main__ import jpp, results', number=iterations)
print time1 # Outputs : 17.7903265883
print time2 # Outputs : 23.5605850732
If I move the import from his function to global scope, it gives...
imcoins : 13.4129249463 seconds
jpp : 21.8191823393 seconds

Remove duplicates and combine multiple lists into one?

How do I remove duplicates and combine multiple lists into one like so:
function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]) should return exactly:
[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]
The easiest one would be using defaultdict .
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i,j in l:
d[i].append(j) #append value to the key
>>> d
=> defaultdict(<class 'list'>, {'hello': ['me.txt'], 'good': ['me.txt', 'money.txt'],
'rep': ['money.txt']})
#to get it in a list
>>> out = [ [key,d[key]] for key in d]
>>> out
=> [['hello', ['me.txt']], ['good', ['me.txt', 'money.txt']], ['rep', ['money.txt']]]
#driver values :
IN : l = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
Try This ( no library needed ):
your_input_data = [ ["hello","me.txt"], ["good","me.txt"], ["good","me.txt"], ["good","money.txt"], ["rep", "money.txt"] ]
my_dict = {}
for box in your_input_data:
if box[0] in my_dict:
buffer_items = []
for items in box[1:]:
if items not in my_dict[box[0]]:
buffer_items.append(items)
remove_dup = list(set(buffer_items + my_dict[box[0]]))
my_dict[box[0]] = remove_dup
else:
buffer_items = []
for items in box[1:]:
buffer_items.append(items)
remove_dup = list(set(buffer_items))
my_dict[box[0]] = remove_dup
last_point = [[keys, values] for keys, values in my_dict.items()]
print(last_point)
Good Luck ...
You can do it with traditional dictionaries too.
In [30]: l1 = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
In [31]: for i, j in l1:
...: if i not in d2:
...: d2[i] = j
...: else:
...: val = d2[i]
...: d2[i] = [val, j]
...:
In [32]: d2
Out[32]: {'good': ['me.txt', 'money.txt'], 'hello': 'me.txt', 'rep': 'money.txt'}
In [33]: out = [ [key,d1[key]] for key in d1]
In [34]: out
Out[34]:
[['rep', ['money.txt']],
['hello', ['me.txt']],
['good', ['me.txt', 'money.txt']]]
Let's first understand the actual problem :
Example Hint :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
Your actual problem solution :
list_1=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
no_dublicates={}
for item in list_1:
if item[0] not in no_dublicates:
no_dublicates[item[0]]=["".join(item[1:])]
else:
no_dublicates[item[0]].extend(item[1:])
list_result=[]
for key,value in no_dublicates.items():
list_result.append([key,value])
print(list_result)
output:
[['hello', ['me.txt']], ['rep', ['money.txt']], ['good', ['me.txt', 'money.txt']]]
yourList=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
expectedList=[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]
def getall(allsec, listKey, uniqlist):
if listKey not in uniqlist:
uniqlist.append(listKey)
return [listKey, [x[1] for x in allsec if x[0] == listKey]]
uniqlist=[]
result=sorted(list(filter(lambda x:x!=None, [getall(yourList,elem[0],uniqlist) for elem in yourList])))
print(result)
hope this helps
This can easily be solved using dict and sets.
def combine_duplicates(given_list):
data = {}
for element_1, element_2 in given_list:
data[element_1] = data.get(element_1, set()).add(element_2)
return [[k, list(v)] for k, v in data.items()]
Using Python to create a function that gives you the exact required output can be done as follows:
from collections import defaultdict
def function(data):
entries = defaultdict(list)
for k, v in data:
entries[k].append(v)
return sorted([k, v] for k, v in entries.items())
print(function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]))
The output is sorted before being returned as per your requirement. This would display the return from the function as:
[['good', ['me.txt', 'money.txt']], ['hello', ['me.txt']], ['rep', ['money.txt']]]
It also ensures that the keys are sorted. A dictionary is used to deal with the removal of duplicates (as keys need to be unique).
A defaultdict() is used to simplify the building of lists within the dictionary. The alternative would be to try and append a new value to an existing key, and if there is a KeyError exception, then add the new key instead as follows:
def function(data):
entries = {}
for k, v in data:
try:
entries[k].append(v)
except KeyError as e:
entries[k] = [v]
return sorted([k, v] for k, v in entries.items())
Create a empty array push the index 0 from childs arrays and join to convert all values to a string separate by space .
var your_input_data = [ ["hello","hi", "jel"], ["good"], ["good2","lo"], ["good3","lt","ahhahah"], ["rep", "nice","gr8", "job"] ];
var myprint = []
for(var i in your_input_data){
myprint.push(your_input_data[i][0]);
}
console.log(myprint.join(' '))

create dict combining two other dicts

What is the best way to create a dict from two other dicts (very big one and small one)?
We have:
big_dict = {
'key1':325,
'key2':326,
'key3':327,
...
}
small_dict = {
325:0.698,
326:0.684,
327:0.668
}
Needs to get a dict for data in small_dict, but we should use keys from big_dict:
comb_dict = {
'key1':0.698,
'key2':0.684,
'key3':0.668
}
The following code works with all cases (example shown in the driver values), with a more EAFP oriented approach.
>>> d = {}
>>> for key,val in big_dict.items():
try:
d[key] = small_dict[val]
except KeyError:
continue
=> {'key1': 0.698, 'key2': 0.684, 'key3': 0.668}
#driver values :
IN : big_dict = {
'key1':325,
'key2':326,
'key3':327,
'key4':330 #note that small_dict[330] will give KeyError
}
IN : small_dict = {
325:0.698,
326:0.684,
327:0.668
}
Or, using Dictionary Comprehension :
>>> {key:small_dict[val] for key,val in big_dict.items() if val in small_dict}
=> {'key1': 0.698, 'key2': 0.684, 'key3': 0.668}
If there are values in big_dict that may not be present as keys in small_dict, this will work:
combined_dict = {}
for big_key, small_key in big_dict.items():
combined_dict[big_key] = small_dict.get(small_key)
Or you might want to use a different default value instead with:
combined_dict[big_key] = small_dict.get(small_key, default='XXX')
Or you might want to raise a KeyError to indicate a problem with your data:
combined_dict[big_key] = small_dict[small_key]
Or you might want to skip missing keys:
if small_key in small_dict:
combined_dict[big_key] = small_dict[small_key]
You could use dictionary comprehension:
comb_dict = {k: small_dict[v] for k, v in big_dict.iteritems()}
If big_dict may contain values that are not keys in small_dict you could just ignore them:
comb_dict = {k: small_dict[v] for k, v in big_dict.iteritems() if v in small_dict}
or use the original value:
{k: (small_dict[v] if v in small_dict else v) for k, v in big_dict.iteritems()}
(Use items() in Python3)
keys = small_dict.keys()
combined_dict = {k:small_dict[v] for k,v in big_dict.items() if v in keys}
>>> combined_dict
{'key3': 0.668, 'key2': 0.684, 'key1': 0.698}

Categories