Referring to parts of large JSON files

Referring to parts of large JSON files - python

I am currently trying to have python parse JSON similar to the one at https://petition.parliament.uk/petitions/560216.json.
My problem is that the data I need is nested in a lot of parts and I don't know how to tell python which part to take.
A simplified version of the data I need is below
{
"data": {
"attributes": {
"signatures_by_country": [
{
"name": "Afghanistan",
"code": "AF",
"signature_count": 1
},
{
"name": "Algeria",
"code": "DZ",
"signature_count": 2
},
]
}
}
}
I am trying to pull the "signature_count" part.

The below code collect what you have asked to a list
data = {
"data": {
"attributes": {
"signatures_by_country": [
{
"name": "Afghanistan",
"code": "AF",
"signature_count": 1
},
{
"name": "Algeria",
"code": "DZ",
"signature_count": 2
},
]
}
}
}
counts = [x['signature_count'] for x in data['data']['attributes']['signatures_by_country']]
print(counts)
output
[1,2]
Count by country below
counts = [{x['name']:x['signature_count']} for x in data['data']['attributes']['signatures_by_country']]
output
[{'Afghanistan': 1}, {'Algeria': 2}]

Related

replace nested document array mongodb with python

i have this document in mongodb
{
"_id": {
"$oid": "62644af0368cb0a46d7c2a95"
},
"insertionData": "23/04/2022 19:50:50",
"ipfsMetadata": {
"Name": "data.json",
"Hash": "Qmb3FWgyJHzJA7WCBX1phgkV93GiEQ9UDWUYffDqUCbe7E",
"Size": "431"
},
"metadata": {
"sessionDate": "20220415 17:42:55",
"dataSender": "user345",
"data": {
"height": "180",
"weight": "80"
},
"addtionalInformation": [
{
"name": "poolsize",
"value": "30m"
},
{
"name": "swimStyle",
"value": "mariposa"
},
{
"name": "modality",
"value": "swim"
},
{
"name": "gender-title",
"value": "schoolA"
}
]
},
"fileId": {
"$numberLong": "4"
}
}
I want to update nested array document, for instance the name with gender-tittle. This have value schoolA and i want to change to adult like the body. I give the parameter number of fileId in the post request and in body i pass this
post request : localhost/sessionUpdate/4
and body:
{
"name": "gender-title",
"value": "adultos"
}
flask
#app.route('/sessionUpdate/<string:a>', methods=['PUT'])
def sessionUpdate(a):
datas=request.json
r=str(datas['name'])
r2=str(datas['value'])
print(r,r2)
r3=collection.update_one({'fileId':a, 'metadata.addtionalInformation':r}, {'$set':{'metadata.addtionalInformation.$.value':r2}})
return str(r3),200
i'm getting the 200 but the document don't update with the new value.

As you are using positional operator $ to work with your array, make sure your select query is targeting array element. You can see in below query that it is targeting metadata.addtionalInformation array with the condition that name: "gender-title"
db.collection.update({
"fileId": 4,
"metadata.addtionalInformation.name": "gender-title"
},
{
"$set": {
"metadata.addtionalInformation.$.value": "junior"
}
})
Here is the Mongo playground for your reference.

3 levels json count in python

I am new at python, I´ve worked with other languages... I´ve made this code with Java and works, but now, I must do it in python. I have a json of 3 levels, the first two are: resources, usages, and I want to count the names on the third level. I´ve seen several examples but I cant get it done
import json
data = {
"startDate": "2019-06-23T16:07:21.205Z",
"endDate": "2019-07-24T16:07:21.205Z",
"status": "Complete",
"usages": [
{
"name": "PureCloud Edge Virtual Usage",
"resources": [
{
"name": "Edge01-VM-GNS-DemoSite01 (1f279086-a6be-4a21-ab7a-2bb1ae703fa0)",
"date": "2019-07-24T09:00:28.034Z"
},
{
"name": "329ad5ae-e3a3-4371-9684-13dcb6542e11",
"date": "2019-07-24T09:00:28.034Z"
},
{
"name": "e5796741-bd63-4b8e-9837-4afb95bb0c09",
"date": "2019-07-24T09:00:28.034Z"
}
]
},
{
"name": "PureCloud for SmartVideo Add-On Concurrent",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
},
{
"name": "PureCloud 3 Concurrent User Usage",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
},
{
"name": "PureCloud Skype for Business WebSDK",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
}
],
"selfUri": "/api/v2/billing/reports/billableusage"
}
cantidadDeLicencias = 0
cantidadDeUsages = len(data['usages'])
for x in range(cantidadDeUsages):
temporal = data[x]
cantidadDeResources = len(temporal['resource'])
for z in range(cantidadDeResources):
print(x)
What changes I have to make? Maybe I have to do it on another approach? Thanks in advance
Update
Code that works
cantidadDeLicencias = 0
for usage in data['usages']:
cantidadDeLicencias = cantidadDeLicencias + len(usage['resources'])
print(cantidadDeLicencias)

You can do this :
for usage in data['usages']:
print(len(usage['resources']))

If you want to know the number of names in each of the resources level, counting the duplicated names (e.g. "jaguilera#gns.com.co" appears more than one time in your data), then just do iterate over the first-level (usages) and sum the size of each array
cantidadDeLicencias = 0
for usage in data['usages']:
cantidadDeLicencias += len(usage['resources'])
print(cantidadDeLicencias)
If you don't want to count duplicates, then use a set and iterate over each resources array
cantidadDeLicencias_set = {}
for usage in data['usages']:
for resource in usage['resources']:
cantidadDeLicencias_set.add(resource['name'])
print(len(cantidadDeLicencias_set ))

Use Python to iterate through nested JSON data

TASK:
I am using a API call to get JSON data from our TeamCity CI tool.
We need to identify all those builds which are using old version of msbuild.
We can identify from this API call data
{
"name": "msbuild_version",
"value": "15.0"
}
At the moment i am saving the entire API call data to a file; however i will later integrate the API call to the same script.
Now to the question at hand; How can i filter this above property i.e. msbuild_version, to say msbuild_version < 15.0 (i.e. all msbuild less than version 15.0) and display the corresponding 'id' and 'projectName' under 'buildType'; e.g.
"id": "AIntegration_BTool_BToolBuilds_DraftBuild",
"projectName": "A Integration / B Tool / VAR Builds",
here is a part of the JSON data file:-
{
"project": [{
"id": "_Root",
"buildTypes": {
"buildType": []
}
}, {
"id": "AI_BTool_BToolBuilds",
"buildTypes": {
"buildType": [{
"id": "AI_BTool_BToolBuilds_DraftBuild",
"projectName": "A I / B Tool / VAR Builds",
"steps": {
"step": [ {
"id": "RUNNER_213",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [ {
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "run-platform",
"value": "x64"
}, {
"name": "targets",
"value": "Build"
}, {
"name": "teamcity.step.mode",
"value": "default"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}, {
"id": "RUNNER_228",
"name": "temp",
"type": "VS.Solution",
"properties": {
"property": [{
"name": "build-file-path",
"value": "x"
}, {
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "vs.version",
"value": "vs2019"
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_ContinuousBuildWithNexusI",
"projectName": "A I / B Tool / VAR Builds",
"steps": {
"step": [ {
"id": "RUNNER_22791",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [{
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "run-platform",
"value": "x86"
}, {
"name": "teamcity.step.mode",
"value": "default"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_VARApiBuilds",
"buildTypes": {
"buildType": [{
"id": "AI_BTool_BToolBuilds_CiVARNewSolutionContinuousBuild",
"projectName": "A I / B Tool / VAR Builds / VAR API builds",
"steps": {
"step": [ {
"id": "RUNNER_22791",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [{
"name": "msbuilds_version",
"value": "15.0"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_VARApiBuilds_CiVARIngestionWindowsServiceNonReleaseBranchBuild",
"projectName": "A I / B Tool / VAR Builds / VAR API builds",
"steps": {
"step": [{
"id": "RUNNER_22790",
"name": "Nuget Installer",
"type": "jb.nuget.installer",
"properties": {
"property": [{
"name": "nuget.path",
"value": "%teamcity.tool.NuGet.CommandLine.4.9.2%"
}, {
"name": "msbuilds_version",
"value": "16.0"
}, {
"name": "nuget.use.restore",
"value": "restore"
}, {
"name": "sln.path",
"value": "VAR.sln"
}, {
"name": "teamcity.step.mode",
"value": "default"
}]
}
}]
}
}]
}
}]
}
My Solution till now
And my code snippet till now
import json
with open('UnArchivedBuilds.txt') as api_call:
read_content = json.load(api_call)
#for project in read_content['project']:
# print (project.get('buildTypes'))
for project in read_content['project']:
# print (project['id'])
print (project['buildTypes']['buildType'])
I am not able to decide on the hierarchy of the JSON to print the relevant data (i.e id and projectName) wherever msbuild_version is less than 15.0

I had a look at your JSON data which was broken. In order to work with the snippet you provided I fixed the malformed data and removed unneeded parts to decrease clutter:
{
"project": [{
"buildTypes": {
"buildType": [{
"id": "AIntegration_BTool_BToolBuilds_DraftBuild",
"projectName": "A Integration / B Tool / VAR Builds"
},
{
"id": "AIntegration_BTool_BToolBuilds_ContinuousBuildIntegration",
"projectName": "A Integration / B Tool / VAR Builds"
}]
}
}]
}
As per my comment above I suggested recursion or using a schema validator to boil the JSON data down. However, as a quick-and-dirty approach and due to the weird structure of your dataset, I decided to do a multi-iteration over some parts of your data. Iterating over the same data is quite ugly and considered to be bad practice in most cases.
Assuming the data is stored in a file called input.json, the following snippet should give you the desired output:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
with open('input.json') as f:
data = json.load(f)
projects = (element for element in data.get('project'))
build_types = (element.get('buildTypes') for element in projects)
build_types = (element.get('buildType') for element in build_types)
for item in build_types:
for element in item:
identifier = element.get('id')
project_name = element.get('projectName')
print('{} --> {}'.format(identifier, project_name))
Printing:
AIntegration_BTool_BToolBuilds_DraftBuild --> A Integration / B Tool / VAR Builds
AIntegration_BTool_BToolBuilds_ContinuousBuildIntegration --> A Integration / B Tool / VAR Builds

how to convert multi valued CSV to Json

I have a csv file with 4 columns data as below.
type,MetalType,Date,Acknowledge
Metal,abc123451,2018-05-26,Success
Metal,abc123452,2018-05-27,Success
Metal,abc123454,2018-05-28,Failure
Iron,abc123455,2018-05-29,Success
Iron,abc123456,2018-05-30,Failure
( I just provided header in the above example data but in my case i dont have header in the data)
how can i convert above csv file to Json in the below format...
1st Column : belongs to --> "type": "Metal"
2nd Column : MetalType: "values" : "value": "abc123451"
3rd column : "Date": "values":"value": "2018-05-26"
4th Column : "Acknowledge": "values":"value": "Success"
and remaining all columns are default values.
As per below format ,
{
"entities": [
{
"id": "XXXXXXX",
"type": "Metal",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "abc123451"
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "2018-05-26"
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "Success"
}
]
}
}
}
}
]
}

Even though jww is right, I built something for you:
I import the csv using pandas:
df = pd.read_csv('data.csv')
then I create a template for the dictionaries you want to add:
d_json = {"entities": []}
template = {
"id": "XXXXXXX",
"type": "",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
}
}
}
}
Now you just need to fill in the dictionary:
for i in range(len(df)):
d = template
d['type'] = df['type'][i]
d['data']['attributes']['MetalType']['values'][0]['value'] = df['MetalType'][i]
d['data']['attributes']['Date']['values'][0]['value'] = df['Date'][i]
d['data']['attributes']['Acknowledge']['values'][0]['value'] = df['Acknowledge'][i]
d_json['entities'].append(d)
I know my way of iterating over the df is kind of ugly, maybe someone knows a cleaner way.
Cheers!

Create dynamic json object in python

I have a dictionary which is contain multiple keys and values and the values also contain the key, value pair. I am not getting how to create dynamic json using this dictionary in python. Here's the dictionary:
image_dict = {"IMAGE_1":{"img0":"IMAGE_2","img1":"IMAGE_3","img2":"IMAGE_4"},"IMAGE_2":{"img0":"IMAGE_1", "img1" : "IMAGE_3"},"IMAGE_3":{"img0":"IMAGE_1", "img1":"IMAGE_2"},"IMAGE_4":{"img0":"IMAGE_1"}}
My expected result like this :
{
"data": [
{
"image": {
"imageId": {
"id": "IMAGE_1"
},
"link": {
"target": {
"id": "IMAGE_2"
},
"target": {
"id": "IMAGE_3"
},
"target": {
"id": "IMAGE_4"
}
}
},
"updateData": "link"
},
{
"image": {
"imageId": {
"id": "IMAGE_2"
},
"link": {
"target": {
"id": "IMAGE_1"
},
"target": {
"id": "IMAGE_3"
}
}
},
"updateData": "link"
},
{
"image": {
"imageId": {
"id": "IMAGE_3"
},
"link": {
"target": {
"id": "IMAGE_1"
},
"target": {
"id": "IMAGE_2"
}
}
},
"updateData": "link"
} ,
{
"image": {
"imageId": {
"id": "IMAGE_4"
},
"link": {
"target": {
"id": "IMAGE_1"
}
}
},
"updateData": "link"
}
]
}
I tried to solve it but I didn't get expected result.
result = {"data":[]}
for k,v in sorted(image_dict.items()):
for a in sorted(v.values()):
result["data"].append({"image":{"imageId":{"id": k},
"link":{"target":{"id": a}}},"updateData": "link"})
print(json.dumps(result, indent=4))

In Python dictionaries you can't have 2 values with the same key. So you can't have multiple targets all called "target". So you can index them. Also I don't know what this question has to do with dynamic objects but here's the code I got working:
import re
dict_res = {}
ind = 0
for image in image_dict:
lin_ind = 0
sub_dict = {'image' + str(ind): {'imageId': {image}, 'link': {}}}
for sub in image_dict[image].values():
sub_dict['image' + str(ind)]['link'].update({'target' + str(lin_ind): {'id': sub}})
lin_ind += 1
dict_res.update(sub_dict)
ind += 1
dict_res = re.sub('target\d', 'target', re.sub('image\d', 'image', str(dict_res)))
print dict_res

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Referring to parts of large JSON files - python

Related

replace nested document array mongodb with python

3 levels json count in python

Use Python to iterate through nested JSON data

how to convert multi valued CSV to Json

Create dynamic json object in python

Categories

Resources