I have a json file that contains many children, like this:
{
"tree": {
"name": "Top Level",
"children": [
{
"name": "[('server', 'Cheese')]",
"children": [
{
"name": "[('waiter', 'mcdonalds')]",
"percentage": "100.00%",
"duration": 100,
"children": [
{
"name": "[('server', 'kfc')]",
"percentage": "15.73%",
"duration": 100,
"children": [
{
"name": "[('server', 'wendys')]",
"percentage": "12.64%",
"duration": 100
},
{
"name": "[('boss', 'dennys')]",
"percentage": "10.96%",
"duration": 100
}
]
},
{
"name": "[('cashier', 'chickfila')]",
"percentage": "10.40%",
"duration": 100,
"children": [
{
"name": "[('cashier', 'burger king')]",
"percentage": "11.20%",
"duration": 100
}
]
}
]
}
]
}
]
}
}
I want to add a unique ID to each child that corresponds to the level they are in so it ends up looking like this, where each ID can tell how many parents the data has and how deep into the json you are (for example, 21.2.3.102 would be the 102nd child of a 3rd child of a 2nd child of the 21st parent):
{
"tree": {
"name": "Top Level",
"id": 1
"children": [
{
"name": "[('server', 'Cheese')]",
"id": 1.1
"children": [
{
"name": "[('waiter', 'mcdonalds')]",
"percentage": "100.00%",
"duration": 100,
"id": 1.1.1
"children": [
{
"name": "[('server', 'kfc')]",
"percentage": "15.73%",
"duration": 100,
"id": 1.1.1.1
"children": [
{
"name": "[('server', 'wendys')]",
"percentage": "12.64%",
"duration": 100,
"id":1.1.1.1.1
},
{
"name": "[('boss', 'dennys')]",
"percentage": "10.96%",
"duration": 100,
"id":1.1.1.1.2
}
]
},
{
"name": "[('cashier', 'chickfila')]",
"percentage": "10.40%",
"duration": 100,
"id":1.1.1.2
"children": [
{
"name": "[('cashier', 'burger king')]",
"percentage": "11.20%",
"duration": 100,
"id":1.1.1.2.1
}
]
}
]
}
]
}
]
}
}
Is there a streamlined way to do this to a very long json file with many many children?
PLEASE
Thanks!
You can use recursion walking, where d - your dictionary from json:
def walk(d, level="1"):
d["id"] = level
for i, child in enumerate(d.get("children", []), 1):
walk(child, level + "." + str(i))
walk(d["tree"])
Related
I have this array of 3 objects.
The parameter that interests me is "id", that is nested into "categories" attribute.
list = [
{
"title": "\u00c9glise Saint-Julien",
"distance": 1841,
"excursionDistance": 1575,
"categories": [
{
"id": "300-3200-0030",
"name": "\u00c9glise",
"primary": true
},
{
"id": "300-3000-0025",
"name": "Monument historique"
}
]
},
{
"title": "Sevdec",
"distance": 2250,
"excursionDistance": 301,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
},
{
"title": "SIEGE 27",
"distance": 2651,
"excursionDistance": 1095,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
}
]
Then I have these two arrays that contain ids:
mCat1 = ["300-3000-0000","300-3000-0023","300-3000-0030","300-3000-0025","300-3000-0024","300-3100"] # macro cat1 = tourism
mCat2 = ["400-4300","700-7600-0322"]
I need to filter "list" on "mCat1" in order to extract in a new variable the object(s) that have at least one "id" that matches those in "mCat1".
Then I need to do the same with "mCat2".
In this example the expected result would be:
mCat1Result = [{
"title": "\u00c9glise Saint-Julien",
"distance": 1841,
"excursionDistance": 1575,
"categories": [
{
"id": "300-3200-0030",
"name": "\u00c9glise",
"primary": true
},
{
"id": "300-3000-0025",
"name": "Monument historique"
}
]
}]
mCat2Result = [{
"title": "Sevdec",
"distance": 2250,
"excursionDistance": 301,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
},
{
"title": "SIEGE 27",
"distance": 2651,
"excursionDistance": 1095,
"categories": [
{
"id": "700-7600-0322",
"name": "Station de recharge",
"primary": true
}
]
}]
What would be the most efficient way to do this? I am able to do it using loops but it is very resource dependent on large datasets.
I started using Python Cubes Olap recently.
I'm trying to sum/avg a JSON postgres column, how can i do this?
my db structure:
events
id
object_type
sn_name
spectra
id
snx_wavelengths (json column)
event_id
my json:
{
"dimensions": [
{
"name": "event",
"levels": [
{
"name": "object_type",
"label": "Object Type",
"attributes": [
"object_type"
]
},
{
"name": "sn_name",
"label": "name",
"attributes": [
"sn_name"
]
}
]
},
{
"name": "spectra",
"levels": [
{
"name": "catalog_name",
"label": "Catalog Name",
"attributes": [
"catalog_name"
]
},
{
"name": "capture_date",
"label": "Capture Date",
"attributes": [
"capture_date"
]
}
]
},
{
"name": "date"
}
],
"cubes": [
{
"id": "uid",
"name": "14G31Yx98ZG8aEhFHjOWNNBmFOETg5APjZo5AiHaqog5YxLMK5",
"dimensions": [
"event",
"spectra",
"date"
],
"aggregates": [
{
"name": "event_snx_wavelengths_sum",
"function": "sum",
"measure": "event.snx_wavelengths"
},
{
"name": "record_count",
"function": "count"
}
],
"joins": [
{
"master": "14G31Yx98ZG8aEhFHjOWNNBmFOETg5APjZo5AiHaqog5YxLMK5.id",
"detail": "spectra.event_id"
},
],
"mappings": {
"event.sn_name": "sn_name",
"event.object_type": "object_type",
"spectra.catalog_name": "spectra.catalog_name",
"spectra.capture_date": "spectra.capture_date",
"event.snx_wavelengths": "spectra.snx_wavelengths",
"date": "spectra.capture_date"
},
}
]
}
I'm getting the follow error:
Unknown attribute ''event.snx_wavelengths''
Anyone can help?
I already tried use mongodb to do the sum, i didnt had success.
This question already has answers here:
How to extract data from dictionary in the list
(3 answers)
Closed 11 months ago.
I have the following json output.
"detections": [
{
"source": "detection",
"uuid": "50594028",
"detectionTime": "2022-03-27T06:50:56Z",
"ingestionTime": "2022-03-27T07:04:50Z",
"filters": [
{
"id": "F2058",
"unique_id": "3638f7c0",
"level": "critical",
"name": "Possible Right-To-Left Override Attack",
"description": "Possible Right-To-Left Override Detected in the Filename",
"tactics": [
"TA0005"
],
"techniques": [
"T1036.002"
],
"highlightedObjects": [
{
"field": "fileName",
"type": "filename",
"value": [
"1465940311.,S=473394(NONAMEFL(Z00057-PIfdp.exe))"
]
},
{
"field": "filePathName",
"type": "fullpath",
"value": "/exports/10_19/mail/12/91/20193/new/1465940311.,S=473394(NONAMEFL(Z00057-PIfdp.exe))"
},
{
"field": "malName",
"type": "detection_name",
"value": "HEUR_RLOTRICK.A"
},
{
"field": "actResult",
"type": "text",
"value": [
"Passed"
]
},
{
"field": "scanType",
"type": "text",
"value": "REALTIME"
}
]
},
{
"id": "F2140",
"unique_id": "5a313874",
"level": "medium",
"name": "Malicious Software",
"description": "A malicious software was detected on an endpoint.",
"tactics": [],
"techniques": [],
"highlightedObjects": [
{
"field": "fileName",
"type": "filename",
"value": [
"1465940311.,S=473394(NONAMEFL(Z00057-PIfdp.exe))"
]
},
{
"field": "filePathName",
"type": "fullpath",
"value": "/exports/10_19/mail/12/91/rs001291-excluido-20193/new/1465940311.,S=473394(NONAMEFL(Z00057-PIfdp.exe))"
},
{
"field": "malName",
"type": "detection_name",
"value": "HEUR_RLOTRICK.A"
},
{
"field": "actResult",
"type": "text",
"value": [
"Passed"
]
},
{
"field": "scanType",
"type": "text",
"value": "REALTIME"
},
{
"field": "endpointIp",
"type": "ip",
"value": [
"xxx.xxx.xxx"
]
}
]
}
],
"entityType": "endpoint",
"entityName": "xxx(xxx.xxx.xxx)",
"endpoint": {
"name": "xxx",
"guid": "d1dd7e61",
"ips": [
"2xx.xxx.xxx"
]
}
}
Inside the 'filters' offset it brings me two levels, one critical and one medim, both with the variable 'name'.
I want to print only the first name, but when I print the 'name', it returns both names:
How do I print only the first one?
If I put print in for filters, it returns both names:
If I put print in for detections, it only returns the second 'name' and that's not what I want:
If you only want to print the name of the first filter, why iterate over it, just index it and print the value under "name":
for d in r['detections']:
print(d['filters'][0]['name'])
I am having some trouble learning about new methods of flattening/aggregating nested object data in python. My current implementation is rather slow, and I want to know some approaches to speed up processing. Consider that I have a dataset of donations defined as:
donations = [
{
"amount": 100,
"organization": {
"name": "Org 1",
"total_budget": 8000,
"states": [
{
"name": "Maine",
"code": "ME"
},
{
"name": "Massachusetts",
"code": "MA"
}
]
}
},
{
"amount": 5000,
"organization": {
"name": "Org 2",
"total_budget": 10000,
"states": [
{
"name": "Massachusetts",
"code": "MA"
}
]
}
},
{
"amount": 5000,
"organization": {
"name": "Org 1",
"total_budget": 8000,
"states": [
{
"name": "Maine",
"code": "ME"
},
{
"name": "Massachusetts",
"code": "MA"
}
]
}
}
]
The relationship of these objects is such that a donation is related to a single organization, and an organization can be related to one or more states.
I additionally can get just the organization dataset as:
organizations = [
{
"name": "Org 1",
"total_budget": 8000,
"states": [
{
"name": "Maine",
"code": "ME"
},
{
"name": "Massachusetts",
"code": "MA"
}
]
},
{
"name": "Org 2",
"total_budget": 10000,
"states": [
{
"name": "Massachusetts",
"code": "MA"
}
]
}
]
The output I am looking to achieve is an aggregation, by state, of the total donations and total budget, where the donation amounts and the organization's total budget is evenly distributed among all states it is associated with. Example for the above dataset:
results = {
"ME": {
"name": "Maine",
"total_donations": 2550
"total_budget": 4000
},
"MA": {
"name": "Massachusetts",
"total_donations": 7550
"total_budget": 14000
}
}
What I have tried so far is to use for loops to iterate through each donation and organization, and sort them into a defaultdict:
from collections import defaultdict
def get_stats():
return { "total_donations": 0, "total_budget": 0, "name": "" }
results = defaultdict(get_stats)
for donation in donations:
for state in donation["organization"]["states"]
results[state["code"]]["total_donations"] += donation["amount"]/len(donation["organization"]["states"])
for organization in organizations:
for state in organization["states"]:
results[state["code"]]["total_budget"] += organization["total_budget"]/len(organization["states"])
results[state["code"]]["name"] = state["state"]
I was thinking about using map/reduce here, but I didn't get the sense that those would improve performance. Any advice here would be super appreciated.
I have a json file where I need to read it in a structured way to insert in a database each value in its respective column, but in the tag "customFields" the fields change index, example: "Tribe / Customer" can be index 0 (row['customFields'][0]) in a json block, and in the other one be index 3 (row['customFields'][3]), so I tried to read the data using the name of the row field ['customFields'] ['Tribe / Customer'], but I got the error below:
TypeError: list indices must be integers or slices, not str
Script:
def getCustomField(ModelData):
for row in ModelData["data"]["squads"][0]["cards"]:
print(row['identifier'],
row['customFields']['Tribe / Customer'],
row['customFields']['Stopped with'],
row['customFields']['Sub-Activity'],
row['customFields']['Activity'],
row['customFields']['Complexity'],
row['customFields']['Effort'])
if __name__ == "__main__":
f = open('test.json')
json_file = json.load(f)
getCustomField(json_file)
JSON:
{
"data": {
"squads": [
{
"name": "TESTE",
"cards": [
{
"identifier": "0102",
"title": "TESTE",
"description": " TESTE ",
"status": "on_track",
"priority": null,
"assignees": [
{
"fullname": "TESTE",
"email": "TESTE"
}
],
"createdAt": "2020-04-16T15:00:31-03:00",
"secondaryLabel": null,
"primaryLabels": [
"TESTE",
"TESTE"
],
"swimlane": "TESTE",
"workstate": "Active",
"customFields": [
{
"name": "Tribe / Customer",
"value": "TESTE 1"
},
{
"name": "Checkpoint",
"value": "GNN"
},
{
"name": "Stopped with",
"value": null
},
{
"name": "Sub-Activity",
"value": "DEPLOY"
},
{
"name": "Activity",
"value": "TOOL"
},
{
"name": "Complexity",
"value": "HIGH"
},
{
"name": "Effort",
"value": "20"
}
]
},
{
"identifier": "0103",
"title": "TESTE",
"description": " TESTE ",
"status": "on_track",
"priority": null,
"assignees": [
{
"fullname": "TESTE",
"email": "TESTE"
}
],
"createdAt": "2020-04-16T15:00:31-03:00",
"secondaryLabel": null,
"primaryLabels": [
"TESTE",
"TESTE"
],
"swimlane": "TESTE",
"workstate": "Active",
"customFields": [
{
"name": "Tribe / Customer",
"value": "TESTE 1"
},
{
"name": "Stopped with",
"value": null
},
{
"name": "Checkpoint",
"value": "GNN"
},
{
"name": "Sub-Activity",
"value": "DEPLOY"
},
{
"name": "Activity",
"value": "TOOL"
},
{
"name": "Complexity",
"value": "HIGH"
},
{
"name": "Effort",
"value": "20"
}
]
}
]
}
]
}
}
You'll have to parse the list of custom fields into something you can access by name. Since you're accessing multiple entries from the same list, a dictionary is the most appropriate choice.
for row in ModelData["data"]["squads"][0]["cards"]:
custom_fields_dict = {field['name']: field['value'] for field in row['customFields']}
print(row['identifier'],
custom_fields_dict['Tribe / Customer'],
...
)
If you only wanted a single field you could traverse the list looking for a match, but it would be less efficient to do that repeatedly.
I'm skipping over dealing with missing fields - you'd probably want to use get('Tribe / Customer', some_reasonable_default) if there's any possibility of the field not being present in the json list.