python find sub tree minimum and maximum value

python find sub tree minimum and maximum value - python

I have a tree with nodes, in each node I have id, name, nodes, roles, parent_id and type.
I need to write a function that check in each nodes, if the nodes has roles, then find the minimum and maximum duration or the roles into the the roles.
after that I need to compare it with his parent(if it's not none), if the parent have maximum duration bigger then it so the children will get the maximum duration of parent/grandparent..
I'll add some examples and what I expected to get
# example 1
roles_1 = [{id: 'r1', 'duration': 4}]
roles_2 = [{id: 'r2', 'duration': 5}, {id: 'r3', 'duration': 8}]
input_tree1 = {
'nodes': [
{
'id': '1',
'name': 'org1',
'parent_id': None,
'type': 'organization',
'nodes': [
{
'id': '2',
'name': 'o_folder_1',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_1,
'nodes': [
{
'id': '3',
'name': 'o1_f1_project_1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
}
],
}
],
}
]
}
I have a list of roles, each list of roles contains object of role. each role has id and duration.
in this example, I have parent org1 with one child o_folder_1 and one grandchild
o1_f1_project_1, what I need to do is tho check the grandchild if he has roles, if yes, I calculate the min and max duration of the sub tree, here is 8, then go level up calculate the roles if the node have , here is 4.. 4 is less than 8 so I need to add a update a new field of maximum_duration and minimum duration as you can see in expected tree:
expected_tree1 = {'nodes': [
{
'id': '1',
'name': 'org1',
'parent_id': None,
'type': 'organization',
'nodes': [
{
'id': '2',
'name': 'o_folder_1',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_1,
'min_duration': 4,
'max_duration': 4,
'nodes': [
{
'id': '3',
'name': 'o1_f1_project_1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
'min_duration': 4,
'max_duration': 8
}
],
}
],
}
]
}
as you can see here, I added new field of min_duration and maximum_duration for each node with the right value.
example 2:
# example 2
roles_1 = [{id: 'r1', 'duration': 12}]
roles_2 = [{id: 'r2', 'duration': 4}, {id: 'r3', 'duration': 2}]
roles_3 = [{id: 'r4', 'duration': 5}]
roles_4 = [{id: 'r5', 'duration': 9}]
input_tree2 = {'nodes':
[
{
'id': '1',
'name': 'org1',
'parent_id': None,
'type': 'organization',
'nodes':
[
{
'id': '2',
'name': 'o1_folder_1',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_1,
'min_duration': 12,
'max_duration': 12
'nodes': [
{
'id': '3',
'name': 'o1_f1_project_1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
},
{
'id': '4',
'name': 'o1_f1_project_2',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_3,
},
],
},
{
'id': '4',
'name': 'o1_folder_2',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_4,
'nodes': [
{
'id': '3',
'name': 'o1_f2_project1',
'nodes': [],
'org_id': '1',
'parent_id': '4',
'type': 'project',
'roles': roles_3,
}
],
},
],
}]}
expected tree 2
expected_tree2 = {'nodes':
[
{
'id': '1',
'name': 'org1',
'parent_id': None,
'type': 'organization',
'nodes':
[
{
'id': '2',
'name': 'o1_folder_1',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_1,
'min_duration': 12,
'max_duration': 12
'nodes': [
{
'id': '3',
'name': 'o1_f1_project_1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
'min_duration': 2,
'max_duration': 12
},
{
'id': '4',
'name': 'o1_f1_project_2',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
'min_duration': 5,
'max_duration': 12
},
],
},
{
'id': '4',
'name': 'o1_folder_2',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_4,
'min_duration': 9,
'max_duration': 9
'nodes': [
{
'id': '3',
'name': 'o1_f2_project1',
'nodes': [],
'org_id': '1',
'parent_id': '4',
'type': 'project',
'roles': roles_3,
'min_duration': 5,
'max_duration': 9
}
],
},
],
}]}
example 3
roles_1 = [{id: 'r1', 'duration': 18}]
roles_2 = [{id: 'r2', 'duration': 4}, {id: 'r3', 'duration': 2}]
roles_3 = [{id: 'r4', 'duration': 5}]
roles_4 = [{id: 'r5', 'duration': 9}, {id: 'r5', 'duration': 12}]
roles_5 = [{id: 'r6', 'duration': 20}]
input_tree3 = {'nodes':
[
{
'id': '1',
'name': 'org1',
'parent_id': None,
'type': 'organization',
'roles': roles_1,
'nodes':
[
{
'id': '2',
'name': 'o1_folder_1',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_3,
'min_duration': 12,
'max_duration': 12,
'nodes': [
{
'id': '3',
'name': 'o1_f1_project_1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
},
{
'id': '4',
'name': 'o1_f1_project_2',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
},
],
},
{
'id': '4',
'name': 'o1_folder_2',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_5,
'nodes': [
{
'id': '3',
'name': 'o1_f2_project1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_4,
}
],
},
],
}]}
expected tree 3
expected_tree3 = {'nodes':
[
{
'id': '1',
'name': 'org1',
'parent_id': None,
'type': 'organization',
'roles': roles_1,
'min_duration': 18,
'max_duration': 18,
'nodes':
[
{
'id': '2',
'name': 'o1_folder_1',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_3,
'min_duration': 5,
'max_duration': 18,
'nodes': [
{
'id': '3',
'name': 'o1_f1_project_1',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
'min_duration': 2,
'max_duration': 18
},
{
'id': '4',
'name': 'o1_f1_project_2',
'nodes': [],
'org_id': '1',
'parent_id': '2',
'type': 'project',
'roles': roles_2,
'min_duration': 2,
'max_duration': 18
},
],
},
{
'id': '4',
'name': 'o1_folder_2',
'org_id': '1',
'parent_id': '1',
'type': 'folder',
'roles': roles_5,
'min_duration': 18,
'max_duration': 20,
'nodes': [
{
'id': '3',
'name': 'o1_f2_project1',
'nodes': [],
'org_id': '1',
'parent_id': '4',
'type': 'project',
'roles': roles_4,
'min_duration': 9,
'max_duration': 20
}
],
},
],
}]}
as you can see in the examples, each time I check in the last child if he has roles, if he has so it calculate the max duration, then compare with his parent if he has, if the parent has maximum duration it will get it.. but if not, the child only the child remain with the maximum duration, his parent will not effect by id and will not get the child duration..
assumptions:
when parent_id is None is the root
it can be with many levels (root->child->grandchild->....)
if the root doesn't have roles so to check his children.
root can be some children (not always binary tree)
actually what I'm tried to do it
def add_min_max_duration(tree:Dict):
for node in tree["nodes"]:
if "roles" in node.keys():
node["min_duration"] = 25
node["max_duration"] = 0
for role in node["roles"]:
node["min_duration"] = min(node["min_duration"], role.duration)
node["max_duration"] = max(node["max_duration"], role.duration)
add_min_max_duration(node)
but it's not good because it's not compare with his parent/grandparent..

data abstraction
Start with a min_max data type with a + operation for combining two min_max instances -
mm1 = min_max(2,5)
mm2 = min_max(3,9)
mm3 = mm1 + mm2 # (2, 9)
We could write it something like this -
from math import inf
class min_max:
def __init__(self, min = +inf, max = -inf):
self.min = min
self.max = max
def __add__(self, other):
return min_max(min(self.min, other.min), max(self.max, other.max))
role_min_max
We can compute the min_max of a role by writing role_min_max -
def role_min_max(role):
return iter_min_max(x["duration"] for x in role)
def iter_min_max(iterable):
r = min_max()
for value in iterable:
r = r + min_max(value, value)
return r
tree_min_max
Finally tree_min_max starts with a default min_max of (-Infinity, Infinity) and we + it to the role_min_max of the tree["roles"] value, if present. For all node of the children tree["nodes"], we call tree_min_max(node, mm) with the updated min_max, mm -
def tree_min_max(tree, mm = min_max()):
if "roles" in tree:
mm = mm + role_min_max(tree["roles"])
return {
"duration_min": mm.min,
"duration_max": mm.max,
**tree,
"nodes": list(tree_min_max(node, mm) for node in tree["nodes"])
}
demo
We will use json to pretty print the newly created tree -
import json
print(json.dumps(tree_min_max(input_tree3), indent=2))
{
"duration_min": Infinity,
"duration_max": -Infinity,
"nodes": [
{
"duration_min": 18,
"duration_max": 18,
"id": "1",
"name": "org1",
"parent_id": null,
"type": "organization",
"roles": [
{
"id": "r1",
"duration": 18
}
],
"nodes": [
{
"duration_min": 5,
"duration_max": 18,
"id": "2",
"name": "o1_folder_1",
"org_id": "1",
"parent_id": "1",
"type": "folder",
"roles": [
{
"id": "r4",
"duration": 5
}
],
"nodes": [
{
"duration_min": 2,
"duration_max": 18,
"id": "3",
"name": "o1_f1_project_1",
"nodes": [],
"org_id": "1",
"parent_id": "2",
"type": "project",
"roles": [
{
"id": "r2",
"duration": 4
},
{
"id": "r3",
"duration": 2
}
]
},
{
"duration_min": 2,
"duration_max": 18,
"id": "4",
"name": "o1_f1_project_2",
"nodes": [],
"org_id": "1",
"parent_id": "2",
"type": "project",
"roles": [
{
"id": "r2",
"duration": 4
},
{
"id": "r3",
"duration": 2
}
]
}
]
},
{
"duration_min": 18,
"duration_max": 20,
"id": "4",
"name": "o1_folder_2",
"org_id": "1",
"parent_id": "1",
"type": "folder",
"roles": [
{
"id": "r6",
"duration": 20
}
],
"nodes": [
{
"duration_min": 9,
"duration_max": 20,
"id": "3",
"name": "o1_f2_project1",
"nodes": [],
"org_id": "1",
"parent_id": "2",
"type": "project",
"roles": [
{
"id": "r5",
"duration": 9
},
{
"id": "r5",
"duration": 12
}
]
}
]
}
]
}
]
}
Note, your role_* data should put id in quotes, otherwise the key for the dictionary is actually the id function, not the string "id" like you are probably intending.
Also note there was a hard-code "min_duration" and "max_duration" in the input provided which I removed here in this post.
To remove the duration_min and duration_max from the root note, we can add a special case for when "roles" is not present in the node -
def tree_min_max(tree, mm = min_max()):
if "roles" in tree:
mm = mm + role_min_max(tree["roles"])
return {
"duration_min": mm.min,
"duration_max": mm.max,
**tree,
"nodes": list(tree_min_max(node, mm) for node in tree["nodes"])
}
else:
return {
**tree,
"nodes": list(tree_min_max(node, mm) for node in tree["nodes"])
}
without the min_max class
We could write min_max as a simple function which takes two tuples and outputs a tuple of the combined minimums and maximums -
def min_max(a, b):
(amin, amax) = a
(bmin, bmax) = b
return min(amin, bmin), max(amax, bmax)
Then we update iter_min_max -
def iter_min_max(iterable):
r = (+inf, -inf)
for value in iterable:
r = min_max(r, (value, value))
return r
In tree_min_max we need to initialize mm to the default min-max and if "roles" is present, update mm with the computed role_min_max. Notice because mm is no longer a class, we cannot access the subfields, mm.min and mm.max. Instead mm is a tuple and the fields are mm[0] and mm[1] -
def tree_min_max(tree, mm = (+inf, -inf)):
if "roles" in tree:
mm = min_max(mm, role_min_max(tree["roles"]))
return {
"duration_min": mm[0],
"duration_max": mm[1],
**tree,
"nodes": list(tree_min_max(node, mm) for node in tree["nodes"])
}
This is more cognitive load on the programmer and they need to be careful to initialize a min-max with (+inf, -inf) manually. Using the class avoids this burden and makes it less likely for a mistake to occur.

Related

Python: Change a JSON value

Let's say I have the following JSON file named output.
{'fields': [{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'},
}],
'type': 'struct'}
If type key has a value datetimeoffset, I would like to change it to dateTime and if If type key has a value Int32, I would like to change it to integer and like this, I have multiple values to replace.
The expected output is
{'fields': [{ 'name': 2, 'type': 'integer'},
{ 'name': 12, 'type': 'string'},
{ 'name': 9, 'type': 'dateTime'},
,
}],
'type': 'struct'}
Can anyone help with this in Python?

You can try this out:
substitute = {"Int32": "integer", "datetimeoffset": "dateTime"}
x = {'fields': [
{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'}
],'type': 'struct'}
for i in range(len(x['fields'])):
if x['fields'][i]["type"] in substitute:
x['fields'][i]['type'] = substitute[x['fields'][i]['type']]
print(x)

You can use the following code. Include in equivalences dict the values you want to replace:
json = {
'fields': [
{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'},
],
'type': 'struct'
}
equivalences = {"datetimeoffset": "dateTime", "Int32": "integer"}
#Replace values based on equivalences dict
for i, data in enumerate(json["fields"]):
if data["type"] in equivalences.keys():
json["fields"][i]["type"] = equivalences[data["type"]]
print(json)
The output is:
{
"fields": [
{
"name": 2,
"type": "integer"
},
{
"name": 12,
"type": "string"
},
{
"name": 9,
"type": "dateTime"
}
],
"type": "struct"
}

simple but ugly way:
json_ ={'fields': [{'name': 2, 'type': 'Int32'},
{'name': 12, 'type': 'string'},
{'name': 9, 'type': 'datetimeoffset'}], 'type': 'struct'}
result = json.loads(json.dumps(json_ ).replace("datetimeoffset", "dateTime").replace("Int32", "integer"))

Remove nested element occurs twice but should be only once

I have a problem. I want to remove all nested elements inside a dict. But unfortunately my code does not work. Every nested element occurs twice, but it should be occurs only once.
What is the problem for that?
Method
def nested_dict(dictionaries):
my_list = []
for my_Dict in dictionaries:
my_new_dict = {}
for key in my_Dict.keys():
if isinstance(my_Dict[key], dict):
idx = str(uuid.uuid4())
my_Dict[key]["__id"] = idx
my_new_dict[key] = my_Dict[key]
my_Dict[key] = idx
my_list.append(my_new_dict)
return my_list
Running example
import uuid
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
my_Dict2 = {
'_key': '2',
'group': 'test',
'data2': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail2': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
dictionaries = [my_Dict, my_Dict2]
def nested_dict(dictionaries):
my_list = []
for my_Dict in dictionaries:
my_new_dict = {}
for key in my_Dict.keys():
if isinstance(my_Dict[key], dict):
idx = str(uuid.uuid4())
my_Dict[key]["__id"] = idx
my_new_dict[key] = my_Dict[key]
my_Dict[key] = idx
my_list.append(my_new_dict)
return my_list
result = nested_dict(dictionaries)
result
[OUT]
[{'data': {'__id': '46f4eb3d-977c-4da4-a99c-c9bfa831b96e'},
'detail': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]},
'__id': 'fad4053e-75e5-4a03-93b6-67e0df814d23'}},
{'data': {'__id': '46f4eb3d-977c-4da4-a99c-c9bfa831b96e'},
'detail': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]},
'__id': 'fad4053e-75e5-4a03-93b6-67e0df814d23'}},
{'data2': {'__id': '6afcf48e-508c-476b-98f3-9bf1e8370fb4'},
'detail2': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]},
'__id': '2d4745ea-decd-45dc-aa0b-7bea5c449c34'}},
{'data2': {'__id': '6afcf48e-508c-476b-98f3-9bf1e8370fb4'},
'detail2': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]},
'__id': '2d4745ea-decd-45dc-aa0b-7bea5c449c34'}}]
What I want
[{'data': {'__id': '46f4eb3d-977c-4da4-a99c-c9bfa831b96e'},
'detail': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]},
'__id': 'fad4053e-75e5-4a03-93b6-67e0df814d23'}},
{'data2': {'__id': '6afcf48e-508c-476b-98f3-9bf1e8370fb4'},
'detail2': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]},
'__id': '2d4745ea-decd-45dc-aa0b-7bea5c449c34'}}]

import uuid
import json
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
my_Dict2 = {
'_key': '2',
'group': 'test',
'data2': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail2': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
dictionaries = [my_Dict, my_Dict2]
def nested_dict(dictionaries):
my_list = []
for my_Dict in dictionaries:
my_new_dict = {}
for key in my_Dict.keys():
if isinstance(my_Dict[key], dict):
idx = str(uuid.uuid4())
my_Dict[key]["__id"] = idx
my_new_dict[key] = my_Dict[key]
my_Dict[key] = idx
my_list.append(my_new_dict)
return my_list
output:
[
{
"data": {
"__id": "5c6769cf-01e5-4f5d-acfa-622472163aba"
},
"detail": {
"selector": {
"number": "12312",
"isTrue": true,
"requirements": [
{
"type": "customer",
"requirement": "1"
}
]
},
"__id": "d167277f-4d02-4d53-934b-131187f6f214"
}
},
{
"data2": {
"__id": "e9182913-c2fc-4d60-adb8-b0b8274faf50"
},
"detail2": {
"selector": {
"number": "12312",
"isTrue": true,
"requirements": [
{
"type": "customer",
"requirement": "1"
}
]
},
"__id": "46e6be7b-8903-4d2a-a768-f6b24fcc5d31"
}
}
]
only minor changes needed that is you are appending the list within inner for loop but you should do it at outer for loop level. I have pasted the code with output which I got

I think it is because my_new_dict is holding an object that is changed by the time it appends to the list.
def nested_dict(dictionaries):
my_list = []
for my_Dict in dictionaries:
my_new_dict = {}
for key in my_Dict.keys():
if isinstance(my_Dict[key], dict):
idx = str(uuid.uuid4())
my_Dict[key]["__id"] = idx
my_new_dict[key] = my_Dict[key]
my_Dict[key] = idx
my_list.append({key: my_new_dict[key]})
print(my_list)
return my_list

How to get the count for a particular key in the dictionary

My content inside a dictionary is below
I need to now for BusinessArea how many different name key is there, like this need to know Designation also
test=
[ { 'masterid': '1', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Accounting', 'parentname': 'Finance'}, { 'id': '3', 'name': 'Research', 'parentname': 'R & D' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }] },
{ 'masterid': '2', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Research', 'parentname': '' }, { 'id': '3', 'name': 'Accounting', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Tester' }, { 'id': '5033', 'name': 'Developer' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]},
{ 'masterid': '3', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Engineering' }, { 'id': '3', 'name': 'Engineering', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Developer' }, { 'id': '5033', 'name': 'Developer', 'parentname': '' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]}]
I want to get the count of masterid of BusinessArea and Designation which is all the names
Expected out is below
[
{
"name": "BusinessArea",
"values": [
{
"name": "Accounting",
"count": "2"
},
{
"name": "Research",
"count": "2"
},
{
"name": "Engineering",
"count": "1"
}
]
},
{
"name": "Designation",
"values": [
{
"name": "L1",
"count": "3"
},
{
"name": "l2",
"count": "3"
}
]
}
]

Try this:
res=[{'name': 'BusinessArea', 'values': []}, {'name': 'Designation', 'values': []}]
listbus=sum([i['BusinessArea'] for i in test], [])
listdes=sum([i['Designation'] for i in test], [])
res[0]['values']=[{'name':i, 'count':0} for i in set(k['name'] for k in listbus)]
res[1]['values']=[{'name':i, 'count':0} for i in set(k['name'] for k in listdes)]
for i in listbus:
for k in range(len(res[0]['values'])):
if i['name']==res[0]['values'][k]['name']:
res[0]['values'][k]['count']+=1
for i in listdes:
for k in range(len(res[1]['values'])):
if i['name']==res[1]['values'][k]['name']:
res[1]['values'][k]['count']+=1
>>> print(res)
[{'name': 'BusinessArea', 'values': [{'name': 'Accounting', 'count': 2}, {'name': 'Research', 'count': 2}, {'name': 'Engineering', 'count': 2}]}, {'name': 'Designation', 'values': [{'name': 'L1', 'count': 3}, {'name': 'L2', 'count': 6}]}]

You could count unique names using a nested collections.defaultdict:
from collections import defaultdict
from json import dumps
keys = ["BusinessArea", "Designation"]
group_counts = defaultdict(lambda: defaultdict(int))
for group in test:
for key in keys:
names = [item["name"] for item in group[key]]
unique_names = list(dict.fromkeys(names))
for name in unique_names:
group_counts[key][name] += 1
print(dumps(group_counts, indent=2))
Which will give you these counts:
{
"BusinessArea": {
"Accounting": 2,
"Research": 2,
"Engineering": 1
},
"Designation": {
"L1": 3,
"L2": 3
}
}
Then you could modify the result to get the list of dicts you expect:
result = [
{
"name": name,
"values": [{"name": value, "count": count} for value, count in counts.items()],
}
for name, counts in group_counts.items()
]
print(dumps(result, indent=2))
Which gives you this:
[
{
"name": "BusinessArea",
"values": [
{
"name": "Accounting",
"count": 2
},
{
"name": "Research",
"count": 2
},
{
"name": "Engineering",
"count": 1
}
]
},
{
"name": "Designation",
"values": [
{
"name": "L1",
"count": 3
},
{
"name": "L2",
"count": 3
}
]
}
]

How can I create aggregate expressions of this list of dicts?

I have a list of dictionaries that expresses periods+days for a class in a student information system. Here's the data I'd like to aggregate:
[
{
'period': {
'name': '1',
'sort_order': 1
},
'day': {
'name': 'A',
'sort_order': 1
}
},
{
'period': {
'name': '1',
'sort_order': 1
},
'day': {
'name': 'B',
'sort_order': 2
}
},
{
'period': {
'name': '1',
'sort_order': 1
},
'day': {
'name': 'C',
'sort_order': 1
}
},
{
'period': {
'name': '3',
'sort_order': 3
},
'day': {
'name': 'A',
'sort_order': 1
}
},
{
'period': {
'name': '3',
'sort_order': 3
},
'day': {
'name': 'B',
'sort_order': 2
}
},
{
'period': {
'name': '3',
'sort_order': 3
},
'day': {
'name': 'C',
'sort_order': 2
}
},
{
'period': {
'name': '4',
'sort_order': 4
},
'day': {
'name': 'D',
'sort_order': 3
}
}
]
The aggregated string I'd like the above to reduce to is 1,3(A-C) 4(D). Notice that objects that aren't "adjacent" (determined by the object's sort_order) to each other are delimited by , and "adjacent" records are delimited by a -.
EDIT
Let me try to elaborate on the aggregation process. Each "class meeting" object contains a period and day. There are usually ~5 periods per day, and the days alternate cyclically between A,B,C,D, etc. So if I have a class that occurs 1st period on an A day, we might express that as 1(A). If a class occurs on 1st and 2nd period on an A day, the raw form of that might be 1(A),2(A), but it can be shortened to 1-2(A).
Some classes might not be in "adjacent" periods or days. A class might occur on 1st period and 3rd period on an A day, so its short form would be 1,3(A). However, if that class were on 1st, 2nd, and 3rd period on an A day, it could be written as 1-3(A). This also applies to days, so if a class occurs on 1st,2nd, and 3rd period, on A,B, and C day, then we could write it 1-3(A-C).
Finally, if a class occurs on 1st,2nd, and 3rd period and on A,B, and C day, but also on 4th period on D day, its short form would be 1-3(A-C) 4(D).
What I've tried
The first step that occurs to me to perform is to "group" the meeting objects into related sub-lists with the following function:
def _to_related_lists(list):
"""Given a list of section meeting dicts, return a list of lists, where each sub-list is list of
related section meetings, either related by period or day"""
related_list = []
sub_list = []
related_values = set()
for index, section_meeting_object in enumerate(list):
# starting with empty values list
if not related_values:
related_values.add(section_meeting_object['period']['name'])
related_values.add(section_meeting_object['day']['name'])
sub_list.append(section_meeting_object)
elif section_meeting_object['period']['name'] in related_values or section_meeting_object['day']['name'] in related_values:
related_values.add(section_meeting_object['period']['name'])
related_values.add(section_meeting_object['day']['name'])
sub_list.append(section_meeting_object)
else:
# no related values found in current section_meeting_object
related_list.append(sub_list)
sub_list = []
related_values = set()
related_values.add(section_meeting_object['period']['name'])
related_values.add(section_meeting_object['day']['name'])
sub_list.append(section_meeting_object)
related_list.append(sub_list)
return related_list
Which returns:
[
[{
'period': {
'sort_order': 1,
'name': '1'
},
'day': {
'sort_order': 1,
'name': 'A'
}
}, {
'period': {
'sort_order': 1,
'name': '1'
},
'day': {
'sort_order': 2,
'name': 'B'
}
}, {
'period': {
'sort_order': 2,
'name': '2'
},
'day': {
'sort_order': 1,
'name': 'A'
}
}, {
'period': {
'sort_order': 2,
'name': '2'
},
'day': {
'sort_order': 2,
'name': 'B'
}
}],
[{
'period': {
'sort_order': 4,
'name': '4'
},
'day': {
'sort_order': 3,
'name': 'C'
}
}]
]
If the entire string 1-3(A-C) 4(D) is the aggregate expression I'd like in the end, let's call 1-3(A-C) and 4(D) "sub-expressions". Each related sub-list would be a "sub-expression", so I was thinking I'd somehow iterate through every sublist and create the sub-expression, but I"m not exactly sure how to do that.

First, let us define your list as d_list.
d_list = [
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'A'}},
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'C'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 1, 'name': 'A'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'C'}},
{'period': {'sort_order': 4, 'name': '4'}, 'day': {'sort_order': 3, 'name': 'D'}},
]
Note that I use the python native module string to define that B is between A and C. Thus what you may want to do is
import string
agg0 = {}
for d in d_list:
name = d['period']['name']
if name not in agg0:
agg0[name] = []
day = d['day']
agg0[name].append(day['name'])
agg1 = {}
for k,v in agg0.items():
pos_in_alph = [string.ascii_lowercase.index(el.lower()) for el in v]
allowed_indexes = [max(pos_in_alph),min(pos_in_alph)]
agg1[k] = [el for el in v if string.ascii_lowercase.index(el.lower()) in allowed_indexes]
agg = {}
for k,v in agg1.items():
w = tuple(v)
if w not in agg:
agg[w] = {'ks':[],'gr':len(agg0[k])>2}
agg[w]['ks'].append(k)
print agg[w]
str_ = ''
for k,v in sorted(agg.items(), key=lambda item:item[0], reverse=False):
str_ += ' {pnames}({dnames})'.format(pnames=('-' if v['gr'] else ',').join(sorted(v['ks'])),
dnames='-'.join(k))
print(str_.strip())
which outputs 1-3(A-C) 4(D)
Following #NathanJones's comment, note that if d_list were defined as
d_list = [
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'A'}},
##{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 1, 'name': '1'}, 'day': {'sort_order': 1, 'name': 'C'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 1, 'name': 'A'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'B'}},
{'period': {'sort_order': 3, 'name': '3'}, 'day': {'sort_order': 2, 'name': 'C'}},
{'period': {'sort_order': 4, 'name': '4'}, 'day': {'sort_order': 3, 'name': 'D'}},
]
The code above would print 1,3(A-C) 4(D)

Python sort a JSON list by two key values

I have a JSON list looks like this:
[{ "id": "1", "score": "100" },
{ "id": "3", "score": "89" },
{ "id": "1", "score": "99" },
{ "id": "2", "score": "100" },
{ "id": "2", "score": "59" },
{ "id": "3", "score": "22" }]
I want to sort the id first, I used
sorted_list = sorted(json_list, key=lambda k: int(k['id']), reverse = False)
This will only sort the list by id, but base on id, I also want sort the score as will, the final list I want is like this:
[{ "id": "1", "score": "100" },
{ "id": "1", "score": "99" },
{ "id": "2", "score": "100" },
{ "id": "2", "score": "59" },
{ "id": "3", "score": "89" },
{ "id": "3", "score": "22" }]
So for each id, sort their score as well. Any idea how to do that?

use a tuple adding second sort key -int(k["score"]) to reverse the order when breaking ties and remove reverse=True:
sorted_list = sorted(json_list, key=lambda k: (int(k['id']),-int(k["score"])))
[{'score': '100', 'id': '1'},
{'score': '99', 'id': '1'},
{'score': '100', 'id': '2'},
{'score': '59', 'id': '2'},
{'score': '89', 'id': '3'},
{'score': '22', 'id': '3'}]
So we primarily sort by id from lowest-highest but we break ties using score from highest-lowest. dicts are also unordered so there is no way to put id before score when you print without maybe using an OrderedDict.
Or use pprint:
from pprint import pprint as pp
pp(sorted_list)
[{'id': '1', 'score': '100'},
{'id': '1', 'score': '99'},
{'id': '2', 'score': '100'},
{'id': '2', 'score': '59'},
{'id': '3', 'score': '89'},
{'id': '3', 'score': '22'}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python find sub tree minimum and maximum value - python

Related

Python: Change a JSON value

Remove nested element occurs twice but should be only once

How to get the count for a particular key in the dictionary

How can I create aggregate expressions of this list of dicts?

Python sort a JSON list by two key values

Categories

Resources