How to get the max value from a dictionary based on conditions - python

I've one list of dictionary, i want to fetch the max floating point number from 'confidence' where keys ('key') are same.
ab = [{'key': 'gdpr.gdpr_compliance.1', 'value': 'Yes', 'idref': '69dbdba4-14d4-4ac8-a318-0d658e4d5b07', 'xpath': '/html/body/p[24]', 'confidence': 0.985},
{'key': 'gdpr.gdpr_compliance.2', 'value': 'Yes', 'idref': '69e2589a-bbf2-49c3-96fc-01fbee5dde03', 'xpath': '/html/body/p[27]', 'confidence': 0.989},
{'key': 'data_collected.personally_identifiable_information.1', 'value': 'Yes', 'idref': 'f6819b54-07a7-4839-b0cc-8343eed28342', 'xpath': '/html/body/ul[6]/li[1]', 'confidence': 0.562},
{'key': 'data_collected.personally_identifiable_information.2', 'value': 'Yes', 'idref': '496400e5-9665-4697-96bc-c55176cdbd02', 'xpath': '/html/body/ul[6]/li[2]', 'confidence': 0.661}]
Here you can observe 1st two dictionary having gdpr but 3rd dictionary having data_collected.
Here i don't understand how we can get the max value
i tried to do in this way
lis = []
for i in ab:
spl = i['key'].split('.')[0]
i['key'] = spl
if i['key'] == spl:
lis.append(i['confidence'])
print(lis)
expected output should be: [0.989, 0.661]

I'm not sure why you want to get a list when your data is key-based. I'd use a dict myself, but then again, maybe you only want to compare neighbouring values, which you can do with itertools.groupby. I'll include both methods below.
dict
maxes = {}
for d in ab:
confidence = d['confidence']
spl = d['key'].split('.')[0]
if spl not in maxes or confidence > maxes[spl]:
maxes[spl] = confidence
print(maxes)
{'gdpr': 0.989, 'data_collected': 0.661}
groupby
from itertools import groupby
grouper = groupby(ab, lambda d: d['key'].split('.')[0])
maxes = [(k, max(d['confidence'] for d in g)) for k, g in grouper]
print(maxes)
[('gdpr', 0.989), ('data_collected', 0.661)]
Here I'm keeping the keys, but you could very well discard them.
lis = [max(d['confidence'] for d in g) for _k, g in grouper]
print(lis)
[0.989, 0.661]

Where you went wrong
You split i['key'] then you assigned the same value back. It doesn't make sense.
Second you assigned i['key'] to spl then you immediately checked if they are equal. Obviously they will be.
A right approach
Dictionary
highest_value_dict = {}
for i in ab:
spl = i['key'].split('.')[0]
# if no such key, then add it.
# else check if this key is greater than the one in highest_value_dict
if spl not in highest_value_dict or highest_value_dict[spl] < i['confidence']:
highest_value_dict[spl] = i['confidence']
Output :
{'gdpr': 0.989, 'data_collected': 0.661}
If you really want the values as list
list(highest_value_dict.values())
Output :
[0.989, 0.661]

Something like the below. The idea is to use defaultdict that will map the key to the max confidence
from collections import defaultdict
ab = [{'key': 'gdpr.gdpr_compliance.1', 'value': 'Yes', 'idref': '69dbdba4-14d4-4ac8-a318-0d658e4d5b07',
'xpath': '/html/body/p[24]', 'confidence': 0.985},
{'key': 'gdpr.gdpr_compliance.2', 'value': 'Yes', 'idref': '69e2589a-bbf2-49c3-96fc-01fbee5dde03',
'xpath': '/html/body/p[27]', 'confidence': 0.989},
{'key': 'data_collected.personally_identifiable_information.1', 'value': 'Yes',
'idref': 'f6819b54-07a7-4839-b0cc-8343eed28342', 'xpath': '/html/body/ul[6]/li[1]', 'confidence': 0.562},
{'key': 'data_collected.personally_identifiable_information.2', 'value': 'Yes',
'idref': '496400e5-9665-4697-96bc-c55176cdbd02', 'xpath': '/html/body/ul[6]/li[2]', 'confidence': 0.661}]
data = defaultdict(float)
for entry in ab:
value = entry['confidence']
key = entry['key'].split('.')[0]
if data[key] < value :
data[key] = value
for k,v in data.items():
print(f'{k} -> {v}')
output
gdpr -> 0.989
data_collected -> 0.661

I suggest solution with O(n) time and memory complexity:
from typing import List
def get_maximal_values(data: dict) -> List[float]:
# Create iterator for extracting needed data
preparing_data = ((x['key'].split('.')[0], x['confidence']) for x in data)
# Find maximum for each unique key
result = {}
for key, confidence in preparing_data:
result[key] = max(result.get(key, 0), confidence)
# return only confidence values
return list(result.values())

Related

The key of the dict in the python array is the most frequent

I have an array with some dictionaries in it.
Although the following method can be achieved.
But I have to do some more processing on the returned value, which I think is very bad.
Is there a better way?
data = [{'name': 'A'},
{'name': 'A'},
None,
None,
{'name': 'B'},
{'name': 'B'},
{'name': 'B'}]
process = list(map(lambda x: x.get('name') if isinstance(x, dict) else None, data))
result = max(process, key=process.count)
for _ in data:
if isinstance(_, dict) and _['name'] == result:
array_index = _
break
print(data.index(array_index))
{'name':'B'} appears the most times.
Where is the data array {'name':'B'}?
According to the above example, I want to get 4.
But the code above has to be processed by the for loop again, which I think is very bad.
Hey I used github copilot to see how it will solve this problem
def get_index_of_most_frequent_dict_value(data):
"""
Return the index of the most frequent value in the data array
"""
# Create a dictionary to store the frequency of each value
frequency = {}
for item in data:
if item is None:
continue
if item['name'] in frequency:
frequency[item['name']] += 1
else:
frequency[item['name']] = 1
# Find the most frequent value
most_frequent_value = None
most_frequent_value_count = 0
for key, value in frequency.items():
if value > most_frequent_value_count:
most_frequent_value = key
most_frequent_value_count = value
# Find the position of the most frequent value
for i in range(len(data)):
if data[i] is None:
continue
if data[i]['name'] == most_frequent_value:
return i
Output:
4
Comparisons of time:
My solution (a): 5.499999999998562e-06 seconds
Your solution (b): 7.400000000004625e-06 seconds
a < b? Yes
Here is what you can do
import ast
data = [{'name': 'A'},
{'name': 'A'},
None,
None,
{'name': 'B'},
{'name': 'B'},
{'name': 'B'}]
x={str(y):data.count(y) for y in data}
j=ast.literal_eval(max (x))
print(j)

How to consolidate only select values in a dictionary with repeated keys

I have a list of dictionaries that contain entries that have the same key for 'name'. I want to merge all dictionaries with duplicate keys while retaining their name value but summing up their weights.
The code below will output:
{'name': ('jake', 'sully', 'jake', 'sully'), 'weight': 2.0}
# Initialising list of dictionary
ini_dict = [{'name': ('jake', 'sully'), 'weight': 1.0}, {'name': ('jake', 'sully'), 'weight': 1.0}]
# printing initial dictionary
print("initial dictionary", str(ini_dict))
# sum the values with same keys
counter = collections.Counter()
for d in ini_dict:
counter.update(d)
result = dict(counter)
print(result)
However that is smashing together the names as well when in reality I am going for:
{'name': ('jake', 'sully), 'weight': 2.0}
How can I achieve this?
You can use a normal dict:
result = {}
for d in ini_dict:
result[d["name"]] = result.get(d["name"], 0) + d["weight"]
The result will be a dict with name as keys and weight as values:
{('jake', 'sully'): 2.0}
If you wanted it back in list form:
lst = [{"name":k, "weight":v} for k, v in result.items()]
The names in the tuple must be in the same order. ('jake', 'sully') != ('sully', 'jake').

Convert a string to list of dictionaries

I have a string as below
gmr='rule:unique,attribute:geo,name:unq1,rule:sum,attribute:sales,name:sum_sales'
If you see clearly its kind of 2 dictionaries
rule:unique,attribute:geo,name:unq1
and
rule:sum,attribute:sales,name:sum_sales
I want to convert them to as below
[
{'rule': 'sum', 'attribute': 'sales', 'name': 'sum_sales'},
{'rule': 'unique', 'attribute': 'geo', 'name': 'uniq1'}
]
Kindly help
I tried
gmr='rule:unique,attribute:geo,name:unq1,rule:sum,attribute:sales,name:sum_sales'
dlist=[]
at_rule_gm=(x.split(':') for x in gmr.split(','))
dict(at_rule_gm)
but here I get only the last dictionary.
Start with sample of OP:
>>> gmr='rule:unique,attribute:geo,name:unq1,rule:sum,attribute:sales,name:sum_sales'
Make an empty list first.
>>> dlist = [ ]
Loop with entry over list, yielded by gmr.split(','),
Store entry.split(':') into pair,
Check whether first value in pair (the key) is 'rule'
If so, append a new empty dictionary to dlist
Store pair into last entry of dlist:
>>> for entry in gmr.split(','):
pair = entry.split(':')
if pair[0] == 'rule':
dlist.append({ })
dlist[-1][pair[0]] = pair[1]
Print result:
>>> print(dlist)
[{'name': 'unq1', 'attribute': 'geo', 'rule': 'unique'},
{'name': 'sum_sales', 'attribute': 'sales', 'rule': 'sum'}]
Looks like what OP intended to get.
gmr='rule:unique,attribute:geo,name:unq1,rule:sum,attribute:sales,name:sum_sales'
split_str = gmr.split(',')
dlist = []
for num in range(0, len(split_str),3):
temp_dict = {}
temp1 = split_str[num]
temp2 = split_str[num+1]
temp3 = split_str[num+2]
key,value = temp1.split(':')
temp_dict.update({key:value})
key,value = temp2.split(':')
temp_dict.update({key:value})
key,value = temp3.split(':')
temp_dict.update({key:value})
dlist.append(temp_dict)
dict always gives a single dictionary, not a list of dictionaries. For the latter, you can use a list comprehension after first splitting by 'rule:':
gmr = 'rule:unique,attribute:geo,name:unq1,rule:sum,attribute:sales,name:sum_sales'
items = (f'rule:{x}' for x in filter(None, gmr.split('rule:')))
res = [dict(x.split(':') for x in item.split(',') if x) for item in items]
print(res)
# [{'attribute': 'geo', 'name': 'unq1', 'rule': 'unique'},
# {'attribute': 'sales', 'name': 'sum_sales', 'rule': 'sum'}]

Python: Split a list into smaller jsons based on another list

I have one list of program names which need to be sorted into lists of smaller jsons based of a priority list. I need to do this in python 3.
B and C being of the same priority 2, they will be in a list together.
program_names = ['A','B','C','D']
priorities = [1,2,2,3]
Required end result:
[[{"name": "A"}], [{"name":"B"}, {"name":"C"}], [{"name":"D"}]]
Current code:
program_names_list = []
final_list = []
for x in program_names.split(','):
program_names_list.append(x)
for x in program_names_list:
final_list.append([{"name": x}])
That's what I currently have which is outputting the following result:
[[{'name': 'A'}], [{'name': 'B'}], [{'name': 'C'}], [{'name': 'D'}]]
I should add that program_names is a string "A,B,C,D"
Full solution
items = {}
for k, v in zip(priorities, program_names):
items.setdefault(k, []).append(v)
[[{'name': name} for name in items[key]] for key in sorted(items.keys())]
returns:
[[{'name': 'A'}], [{'name': 'B'}, {'name': 'C'}], [{'name': 'D'}]]
In steps
Create a dictionary that uses the priorities as keys and a list of all program names with corresponding priority as values:
items = {}
for k, v in zip(priorities, program_names):
items.setdefault(k, []).append(v)
Go through the sorted keys and create a new list of program names by getting them from the dictionary by the key:
[[{'name': name} for name in items[key]] for key in sorted(items.keys())]
Loop through the priorities and use a dictionary with priorities as keys and lists of programs as values to group all elements with the same priority.
In [24]: from collections import defaultdict
In [25]: program_names = ['A','B','C','D']
In [26]: priorities = [1,2,2,3]
In [27]: d = defaultdict(list)
In [28]: for i, p in enumerate(sorted(priorities)):
d[p].append({'name': program_names[i]})
....:
In [29]: list(d.values())
Out[29]: [[{'name': 'A'}], [{'name': 'B'}, {'name': 'C'}], [{'name': 'D'}]]
Use groupby.
from itertools import groupby
program_names = ['a','b','c','d']
priorities = [1,2,2,3]
data = zip(priorities, program_names)
groups_dict = []
for k, g in groupby(data, lambda x: x[0]):
m = map(lambda x: dict(name=x[1]), list(g))
groups_dict.append(m)
print(groups_dict)
Although this may be wrong from an educational point of view, I cannot resist answering such questions by one-liners:
[[{'name': p_n} for p_i, p_n in zip(priorities, program_names) if p_i == p] for p in sorted(set(priorities))]
(This assumes your "priorities" list may be sorted and is less efficient than the "normal" approach with a defaultdict(list)).
Update: Borrowing from damn_c-s answer, here's an efficient one-liner (not counting the implied from itertools import groupby):
[[{'name': pn} for pi, pn in l] for v, l in groupby(zip(priorities, program_names), lambda x: x[0])]

Dictionary transformation and counter

Object:
data = [{'key': 11, 'country': 'USA'},{'key': 21, 'country': 'Canada'},{'key': 12, 'country': 'USA'}]
the result should be:
{'USA': {0: {'key':11}, 1: {'key': 12}}, 'Canada': {0: {'key':21}}}
I started experiment with:
result = {}
for i in data:
k = 0
result[i['country']] = dict(k = dict(key=i['key']))
and I get:
{'Canada': {'k': {'key': 21}}, 'USA': {'k': {'key': 12}}}
So how can I put the counter instead k? Maybe there is a more elegant way to create the dictionary?
I used the len() of the existing result item:
>>> import collections
>>> data = [{'key': 11, 'country': 'USA'},{'key': 21, 'country': 'Canada'},{'key': 12, 'country': 'USA'}]
>>> result = collections.defaultdict(dict)
>>> for item in data:
... country = item['country']
... result[country][len(result[country])] = {'key': item['key']}
...
>>> dict(result)
{'Canada': {0: {'key': 21}}, 'USA': {0: {'key': 11}, 1: {'key': 12}}}
There may be a more efficient way to do this, but I thought this would be most readable.
#zigg's answer is better.
Here's an alternative way:
import itertools as it, operator as op
def dict_transform(dataset, key_name=None, group_by=None):
result = {}
sorted_dataset = sorted(data, key=op.itemgetter(group_by))
for k,g in it.groupby(sorted_dataset, key=op.itemgetter(group_by)):
result[k] = {i:{key_name:j[key_name]} for i,j in enumerate(g)}
return result
if __name__ == '__main__':
data = [{'key': 11, 'country': 'USA'},
{'key': 21, 'country': 'Canada'},
{'key': 12, 'country': 'USA'}]
expected_result = {'USA': {0: {'key':11}, 1: {'key': 12}},
'Canada': {0: {'key':21}}}
result = dict_transform(data, key_name='key', group_by='country')
assert result == expected_result
To add the number, use the {key:value} syntax
result = {}
for i in data:
k = 0
result[i['country']] = dict({k : dict(key=i['key'])})
dict(k = dict(key=i['key']))
This passes i['key'] as the key keyword argument to the dict constructor (which is what you want - since that results in the string "key" being used as a key), and then passes the result of that as the k keyword argument to the dict constructor (which is not what you want) - that's how parameter passing works in Python. The fact that you have a local variable named k is irrelevant.
To make a dict where the value of k is used as a key, the simplest way is to use the literal syntax for dictionaries: {1:2, 3:4} is a dict where the key 1 is associated with the value 2, and the key 3 is associated with the value 4. Notice that here we're using arbitrary expressions for keys and values - not names - so we can use a local variable and the resulting dictionary will use the named value.
Thus, you want {k: {'key': i['key']}}.
Maybe there is a more elegant way to create the dictionary?
You could create a list by appending items, and then transform the list into a dictionary with dict(enumerate(the_list)). That at least saves you from having to do the counting manually, but it's pretty indirect.

Categories