Converting pandas Dataframe to nested json key pair - python

Here is a sample data from a csv file, where every generation is children of previous generation.
parant,gen1,gen2,get3,gen4,gen5,gen6
query1,AggregateExpression,abc,def,emg,cdf,bcf
query1,And,cse,rds,acd,,
query2,Arithmetic,cbd,rsd,msd,,
query2,Average,as,vs,ve,ew,
query2,BinaryExpression,avsd,sfds,sdf,,
query2,Comparison,sdfs,sdfsx,,,
query3,Count,sfsd,,,,
query3,methods1,add,asd,fdds,sdf,sdf
query3,methods1,average,sdfs,bf,fd,
query4,methods2,distinct,cz,asd,ada,
query4,methods2,eq,sdfs,sdfxcv,sdf,rtyr
query4,methods3,eq,vcx,xcv,cdf,
I need to create a json file of following format, where parents are the index and children are always list of dictionaries and there is a size for the last generation which is calculated no. of time their parent appear (in previous generation).
Example of the first row breakdown:
{
"name": "query1",
"children": [
{
"name": "AggregateExpression",
"children": [
{
"name": "abc",
"children": [
{
"name": "def",
"children": [
{
"name": "emg",
"children": [
{
"name": "cdf",
"children": [
{
"name": "bcf", "size": 1
}
]
}
]
}
]
}
]
}
]
}
]
}
I have tried to use groupby() and to_json() but was not able to complete. But still struggling to build the logic if I need to use lambda or looping. Any suggestion or solution is welcome. Thanks.

Related

Python JSON array left join update

I have a nested JSON array, and a separate second array.
Would like perform the equivalent of a SQL UPDATE using a left join.
In other words, keep all items from the main json, and where the same item (key='order') appears in the secondary one, update/append values in the main.
Can obviously achieve this by looping - but really looking for a more elegant & efficient solution.
Most examples of 'merging' json I've seen involve appending new items, or appending - very little regarding 'updating'.
Any pointers appreciated :)
Main JSON object with nested array 'steps'
{
"manifest_header": {
"name": "test",
},
"steps": [
{
"order": "100",
"value": "some value"
},
{
"order": "200",
"value": "some other value"
}
]
}
JSON Array with values to add
{
"steps": [
{
"order": "200",
"etag": "aaaaabbbbbccccddddeeeeefffffgggg"
}
]
}
Desired Result:
{
"manifest_header": {
"name": "test",
},
"steps": [
{
"order": "100",
"value": "some value"
},
{
"order": "200",
"value": "some other value",
"etag": "aaaaabbbbbccccddddeeeeefffffgggg"
}
]
}

jmespath how do I find the key values in the dictionary?

I have an example json file. I need to extract all the values of the downloadUrl keys:
{
"nodes": {
"children": [
{
"id": "",
"localizedName": "",
"name": "Documents",
"children": [
{
"id": "",
"localizedName": "Brochures",
"name": "Brochures",
"items": [
{
"title": "Brochure",
"downloadUrl": "/documents/brochure-en.pdf",
"fileType": "pdf",
"fileSize": "2.9 MB"
}
]
},
{
"id": "192194",
"localizedName": "",
"name": "Demonstrations",
"items": [
{
"title": "Safety Poster",
"downloadUrl": "safety-poster-en.pdf",
"fileType": "pdf",
"fileSize": "1.1 MB"
}
]
}
]
}
]
}
}
I'm trying to do this with this query:
jmespath.search('nodes[*].downloadUrl', file)
but the list of values is not displayed.
Where is the error?
Statically, your property is under
nodes
children
[ ]
children
[ ]
items
[ ]
downloadUrl
So a query giving you those values would be:
nodes.children[].children[].items[].downloadUrl
If you want something a little more dynamic (let's say that the property names can change but the level at which you will find downloadUrl won't, you could use this query:
*.*[][].*[][].*[?downloadUrl][][].downloadUrl
But sadly, something like querying in an arbitrary structure, like you can do it in jq is not something JMESPath supports at the moment.
You need to do something like.
.search(("nodes[*].children[*].items[*].downloadUrl"))

How to read JSON file with number of key in nested dictionary keeps changing

I have this JSON nested dictionary I need to parse into SQL table. The problem is the number of key tank (max is 4) in nested dictionary keeps changing with different site_id. Are there a way to read them?
{
"data": [
{
"site_id": 30183,
"city": "Seattle",
"state": "US-WA",
"tank": [
{
"id": 00001,
"name": "Diesel"
},
{
"id": 00002,
"name": "Diesel"
},
{
"id": 00003,
"name": "Unleaded 89"
}
]
},
{
"site_id": 200942,
"city": "Boise",
"state": "ID-WA",
"tank": [
{
"id": 00001,
"name": "Diesel"
},
{
"id": 00002,
"name": "Unleaded 95"
}
]
}
]
}
Here is my current code:
for site in response['data']:
row = []
row.extend([site['site_id'], site['city'], site['state']])
for tank in site['tank']:
row.extend([tank['id'], tank['name']])
Any site_id that does not have enough 4 tank can have missing value replaced with NULL
I don't know how to modify it to adjust to different number of tank keys. Any suggestion help! Thank you

Mongodb: Update Many documents with each element in given list

I have this DEVICE collection
[
{
"_id": ObjectId("60265a12f9bf1e3974dabe56"),
"Name": "Device",
"Configuration_ids": [
ObjectId("60265a11f9bf1e3974dabe54"),
ObjectId("60265a11f9bf1e3974dabe55")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe64"),
"Name": "Device2",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5a"),
ObjectId("60265a92f9bf1e3974dabe5b")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe65"),
"Name": "Device3",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5e"),
ObjectId"60265a92f9bf1e3974dabe5f")
]
}
]
I need to update all the documents that match the list of device ids. and push each element in a configuration_ids given list in each matched device. the 2 lists are the same lenght.
my solution is written in the following, but I can do it in one single query?
device_ids=[
ObjectId("60265a12f9bf1e3974dabe56"),
ObjectId("60265a92f9bf1e3974dabe64"),
ObjectId("60265a92f9bf1e3974dabe65")
]
configuration_ids = [
ObjectId("60267d14bc2f40d0dec1de3b"),
ObjectId("60267d14bc2f40d0dec1de3c"),
ObjectId("60267d14bc2f40d0dec1de3d")
]
for i in range(0, len(device_ids)):
update_devices = device_collection.update_one(
{'_id': ObjectId(device_ids[i])},
{'$push': {'Configuration_ids': configuration_ids[i]}}
)
The result:
[
{
"_id": ObjectId("60265a12f9bf1e3974dabe56"),
"Name": "Device",
"Configuration_ids": [
ObjectId("60265a11f9bf1e3974dabe54"),
ObjectId("60265a11f9bf1e3974dabe55"),
ObjectId("60267d14bc2f40d0dec1de3b")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe64"),
"Name": "Device2",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5a"),
ObjectId("60265a92f9bf1e3974dabe5b"),
ObjectId("60267d14bc2f40d0dec1de3c")
]
},
{
"_id": ObjectId("60265a92f9bf1e3974dabe65"),
"Name": "Device3",
"Configuration_ids": [
ObjectId("60265a92f9bf1e3974dabe5e"),
ObjectId"60265a92f9bf1e3974dabe5f"),
ObjectId("60267d14bc2f40d0dec1de3d")
]
}
]
If you were hoping to use update_many to achieve this in a single update, then the short answer is you can't. update_many takes a single filter to determine which documents to update; in your example, each update is a different document id.
If you have a large number of these updates, and performance is an issue, consider using the bulk write operators.

3 levels json count in python

I am new at python, I´ve worked with other languages... I´ve made this code with Java and works, but now, I must do it in python. I have a json of 3 levels, the first two are: resources, usages, and I want to count the names on the third level. I´ve seen several examples but I cant get it done
import json
data = {
"startDate": "2019-06-23T16:07:21.205Z",
"endDate": "2019-07-24T16:07:21.205Z",
"status": "Complete",
"usages": [
{
"name": "PureCloud Edge Virtual Usage",
"resources": [
{
"name": "Edge01-VM-GNS-DemoSite01 (1f279086-a6be-4a21-ab7a-2bb1ae703fa0)",
"date": "2019-07-24T09:00:28.034Z"
},
{
"name": "329ad5ae-e3a3-4371-9684-13dcb6542e11",
"date": "2019-07-24T09:00:28.034Z"
},
{
"name": "e5796741-bd63-4b8e-9837-4afb95bb0c09",
"date": "2019-07-24T09:00:28.034Z"
}
]
},
{
"name": "PureCloud for SmartVideo Add-On Concurrent",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
},
{
"name": "PureCloud 3 Concurrent User Usage",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
},
{
"name": "PureCloud Skype for Business WebSDK",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
}
],
"selfUri": "/api/v2/billing/reports/billableusage"
}
cantidadDeLicencias = 0
cantidadDeUsages = len(data['usages'])
for x in range(cantidadDeUsages):
temporal = data[x]
cantidadDeResources = len(temporal['resource'])
for z in range(cantidadDeResources):
print(x)
What changes I have to make? Maybe I have to do it on another approach? Thanks in advance
Update
Code that works
cantidadDeLicencias = 0
for usage in data['usages']:
cantidadDeLicencias = cantidadDeLicencias + len(usage['resources'])
print(cantidadDeLicencias)
You can do this :
for usage in data['usages']:
print(len(usage['resources']))
If you want to know the number of names in each of the resources level, counting the duplicated names (e.g. "jaguilera#gns.com.co" appears more than one time in your data), then just do iterate over the first-level (usages) and sum the size of each array
cantidadDeLicencias = 0
for usage in data['usages']:
cantidadDeLicencias += len(usage['resources'])
print(cantidadDeLicencias)
If you don't want to count duplicates, then use a set and iterate over each resources array
cantidadDeLicencias_set = {}
for usage in data['usages']:
for resource in usage['resources']:
cantidadDeLicencias_set.add(resource['name'])
print(len(cantidadDeLicencias_set ))

Categories