Unable to pull data from json using python

Unable to pull data from json using python - python

I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.

json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']

You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'

json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']

Related

A Python script that can navigate a .json and export a .csv based on a search term

I want to take a .json of "_PRESET..." items and their "code-state"s with "actions" that contain other "code-state"s, "appearance"s, and "switch"s and turn it into .csv produced from the actions under a given "_PRESET...", including the "code-state"s and the "actions" listed under their individual entries.
This would allow a user to enter the "_PRESET..." name and receive a 3-column .csv file containing each action's "type", name, and "value". There are of course ways to export the entire .json easily, but I can't fathom a way to navigate it like is needed.
enters "_PRESET_Config_A" for
input.json:
{
"abc_data": {
"_PRESET_Config_A": {
"properties": {
"category": "configuration",
"name": "_PRESET_Config_A",
"collection": null,
"description": ""
},
"actions": {
"EN-R9": {
"type": "code_state",
"value": "on"
}
}
},
"PN4FP": {
"properties": {
"category": "uncategorized",
"name": "PN4FP",
"collection": null,
"description": ""
},
"actions": {
"E_xxxxxx_Default": {
"type": "appearance",
"value": "M_Red"
}
}
},
"HEDIS": {
"properties": {
"category": "uncategorized",
"name": "HEDIS",
"collection": null,
"description": ""
},
"actions": {
"E_xxxxxx_Default": {
"type": "appearance",
"value": "M_Purple"
}
}
},
"_PRESET_Config_B": {
"properties": {
"category": "configuration",
"name": "_PRESET_Config_A",
"collection": null,
"description": ""
},
"actions": {
"HEDIS": {
"type": "code_state",
"value": "on"
}
}
},
"EN-R9": {
"properties": {
"category": "uncategorized",
"name": "EN-R9",
"collection": null,
"description": ""
},
"actions": {
"PN4FP": {
"type": "code_state",
"value": "on"
},
"switch_StorageBin": {
"type": "switch",
"value": "00_w_Storage_Bin_R9"
}
}
}
}
}
Desired output.csv
type,name,value
code_state,EN-R9,on
code_state,PN4FP,on
appearance,E_xxxxxx_Default,M_Red
switch,switch_StorageBin,00_w_Storage_Bin_R9

How to browse and get only json position 0 in python [duplicate]

This question already has answers here:
How to extract data from dictionary in the list
(3 answers)
Closed 11 months ago.
I have the following json output.
"detections": [
{
"source": "detection",
"uuid": "50594028",
"detectionTime": "2022-03-27T06:50:56Z",
"ingestionTime": "2022-03-27T07:04:50Z",
"filters": [
{
"id": "F2058",
"unique_id": "3638f7c0",
"level": "critical",
"name": "Possible Right-To-Left Override Attack",
"description": "Possible Right-To-Left Override Detected in the Filename",
"tactics": [
"TA0005"
],
"techniques": [
"T1036.002"
],
"highlightedObjects": [
{
"field": "fileName",
"type": "filename",
"value": [
"1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
]
},
{
"field": "filePathName",
"type": "fullpath",
"value": "/exports/10_19/mail/12/91/20193/new/1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
},
{
"field": "malName",
"type": "detection_name",
"value": "HEUR_RLOTRICK.A"
},
{
"field": "actResult",
"type": "text",
"value": [
"Passed"
]
},
{
"field": "scanType",
"type": "text",
"value": "REALTIME"
}
]
},
{
"id": "F2140",
"unique_id": "5a313874",
"level": "medium",
"name": "Malicious Software",
"description": "A malicious software was detected on an endpoint.",
"tactics": [],
"techniques": [],
"highlightedObjects": [
{
"field": "fileName",
"type": "filename",
"value": [
"1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
]
},
{
"field": "filePathName",
"type": "fullpath",
"value": "/exports/10_19/mail/12/91/rs001291-excluido-20193/new/1465940311.,S=473394(NONAMEFL(Z00057-PI‮fdp.exe))"
},
{
"field": "malName",
"type": "detection_name",
"value": "HEUR_RLOTRICK.A"
},
{
"field": "actResult",
"type": "text",
"value": [
"Passed"
]
},
{
"field": "scanType",
"type": "text",
"value": "REALTIME"
},
{
"field": "endpointIp",
"type": "ip",
"value": [
"xxx.xxx.xxx"
]
}
]
}
],
"entityType": "endpoint",
"entityName": "xxx(xxx.xxx.xxx)",
"endpoint": {
"name": "xxx",
"guid": "d1dd7e61",
"ips": [
"2xx.xxx.xxx"
]
}
}
Inside the 'filters' offset it brings me two levels, one critical and one medim, both with the variable 'name'.
I want to print only the first name, but when I print the 'name', it returns both names:
How do I print only the first one?
If I put print in for filters, it returns both names:
If I put print in for detections, it only returns the second 'name' and that's not what I want:

If you only want to print the name of the first filter, why iterate over it, just index it and print the value under "name":
for d in r['detections']:
print(d['filters'][0]['name'])

How to read fields without numeric index in JSON

I have a json file where I need to read it in a structured way to insert in a database each value in its respective column, but in the tag "customFields" the fields change index, example: "Tribe / Customer" can be index 0 (row['customFields'][0]) in a json block, and in the other one be index 3 (row['customFields'][3]), so I tried to read the data using the name of the row field ['customFields'] ['Tribe / Customer'], but I got the error below:
TypeError: list indices must be integers or slices, not str
Script:
def getCustomField(ModelData):
for row in ModelData["data"]["squads"][0]["cards"]:
print(row['identifier'],
row['customFields']['Tribe / Customer'],
row['customFields']['Stopped with'],
row['customFields']['Sub-Activity'],
row['customFields']['Activity'],
row['customFields']['Complexity'],
row['customFields']['Effort'])
if __name__ == "__main__":
f = open('test.json')
json_file = json.load(f)
getCustomField(json_file)
JSON:
{
"data": {
"squads": [
{
"name": "TESTE",
"cards": [
{
"identifier": "0102",
"title": "TESTE",
"description": " TESTE ",
"status": "on_track",
"priority": null,
"assignees": [
{
"fullname": "TESTE",
"email": "TESTE"
}
],
"createdAt": "2020-04-16T15:00:31-03:00",
"secondaryLabel": null,
"primaryLabels": [
"TESTE",
"TESTE"
],
"swimlane": "TESTE",
"workstate": "Active",
"customFields": [
{
"name": "Tribe / Customer",
"value": "TESTE 1"
},
{
"name": "Checkpoint",
"value": "GNN"
},
{
"name": "Stopped with",
"value": null
},
{
"name": "Sub-Activity",
"value": "DEPLOY"
},
{
"name": "Activity",
"value": "TOOL"
},
{
"name": "Complexity",
"value": "HIGH"
},
{
"name": "Effort",
"value": "20"
}
]
},
{
"identifier": "0103",
"title": "TESTE",
"description": " TESTE ",
"status": "on_track",
"priority": null,
"assignees": [
{
"fullname": "TESTE",
"email": "TESTE"
}
],
"createdAt": "2020-04-16T15:00:31-03:00",
"secondaryLabel": null,
"primaryLabels": [
"TESTE",
"TESTE"
],
"swimlane": "TESTE",
"workstate": "Active",
"customFields": [
{
"name": "Tribe / Customer",
"value": "TESTE 1"
},
{
"name": "Stopped with",
"value": null
},
{
"name": "Checkpoint",
"value": "GNN"
},
{
"name": "Sub-Activity",
"value": "DEPLOY"
},
{
"name": "Activity",
"value": "TOOL"
},
{
"name": "Complexity",
"value": "HIGH"
},
{
"name": "Effort",
"value": "20"
}
]
}
]
}
]
}
}

You'll have to parse the list of custom fields into something you can access by name. Since you're accessing multiple entries from the same list, a dictionary is the most appropriate choice.
for row in ModelData["data"]["squads"][0]["cards"]:
custom_fields_dict = {field['name']: field['value'] for field in row['customFields']}
print(row['identifier'],
custom_fields_dict['Tribe / Customer'],
...
)
If you only wanted a single field you could traverse the list looking for a match, but it would be less efficient to do that repeatedly.
I'm skipping over dealing with missing fields - you'd probably want to use get('Tribe / Customer', some_reasonable_default) if there's any possibility of the field not being present in the json list.

How to add data to a topic using AvroProducer

I have a topic with the following schema. Could someone help me out on how to add data to the different fields.
{
"name": "Project",
"type": "record",
"namespace": "abcdefg",
"fields": [
{
"name": "Object",
"type": {
"name": "Object",
"type": "record",
"fields": [
{
"name": "Number_ID",
"type": "int"
},
{
"name": "Accept",
"type": "boolean"
}
]
}
},
{
"name": "DataStructureType",
"type": "string"
},
{
"name": "ProjectID",
"type": "string"
}
]
}
I tried the following code. I get list is not iterable or list is out of range.
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer
AvroProducerConf = {'bootstrap.servers': 'localhost:9092','schema.registry.url': 'http://localhost:8081'}
value_schema = avro.load('project.avsc')
avroProducer = AvroProducer(AvroProducerConf, default_value_schema = value_schema)
while True:
avroProducer.produce(topic = 'my_topic', value = {['Object'][0] : "value", ['Object'] [1] : "true", ['DataStructureType'] : "testvalue", ['ProjectID'] : "123"})
avroProducer.flush()

It's not clear what you're expecting something like this to do... ['Object'][0] and keys of a dict cannot be lists.
Try sending this, which matches your Avro schema
value = {
'Object': {
"Number_ID", 1,
"Accept": true
},
'DataStructureType' : "testvalue",
'ProjectID' : "123"
}

Use Python to iterate through nested JSON data

TASK:
I am using a API call to get JSON data from our TeamCity CI tool.
We need to identify all those builds which are using old version of msbuild.
We can identify from this API call data
{
"name": "msbuild_version",
"value": "15.0"
}
At the moment i am saving the entire API call data to a file; however i will later integrate the API call to the same script.
Now to the question at hand; How can i filter this above property i.e. msbuild_version, to say msbuild_version < 15.0 (i.e. all msbuild less than version 15.0) and display the corresponding 'id' and 'projectName' under 'buildType'; e.g.
"id": "AIntegration_BTool_BToolBuilds_DraftBuild",
"projectName": "A Integration / B Tool / VAR Builds",
here is a part of the JSON data file:-
{
"project": [{
"id": "_Root",
"buildTypes": {
"buildType": []
}
}, {
"id": "AI_BTool_BToolBuilds",
"buildTypes": {
"buildType": [{
"id": "AI_BTool_BToolBuilds_DraftBuild",
"projectName": "A I / B Tool / VAR Builds",
"steps": {
"step": [ {
"id": "RUNNER_213",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [ {
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "run-platform",
"value": "x64"
}, {
"name": "targets",
"value": "Build"
}, {
"name": "teamcity.step.mode",
"value": "default"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}, {
"id": "RUNNER_228",
"name": "temp",
"type": "VS.Solution",
"properties": {
"property": [{
"name": "build-file-path",
"value": "x"
}, {
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "vs.version",
"value": "vs2019"
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_ContinuousBuildWithNexusI",
"projectName": "A I / B Tool / VAR Builds",
"steps": {
"step": [ {
"id": "RUNNER_22791",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [{
"name": "msbuild_version",
"value": "16.0"
}, {
"name": "run-platform",
"value": "x86"
}, {
"name": "teamcity.step.mode",
"value": "default"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_VARApiBuilds",
"buildTypes": {
"buildType": [{
"id": "AI_BTool_BToolBuilds_CiVARNewSolutionContinuousBuild",
"projectName": "A I / B Tool / VAR Builds / VAR API builds",
"steps": {
"step": [ {
"id": "RUNNER_22791",
"name": "Build",
"type": "MSBuild",
"properties": {
"property": [{
"name": "msbuilds_version",
"value": "15.0"
}, {
"name": "toolsVersion",
"value": "15.0"
}]
}
}]
}
}, {
"id": "AI_BTool_BToolBuilds_VARApiBuilds_CiVARIngestionWindowsServiceNonReleaseBranchBuild",
"projectName": "A I / B Tool / VAR Builds / VAR API builds",
"steps": {
"step": [{
"id": "RUNNER_22790",
"name": "Nuget Installer",
"type": "jb.nuget.installer",
"properties": {
"property": [{
"name": "nuget.path",
"value": "%teamcity.tool.NuGet.CommandLine.4.9.2%"
}, {
"name": "msbuilds_version",
"value": "16.0"
}, {
"name": "nuget.use.restore",
"value": "restore"
}, {
"name": "sln.path",
"value": "VAR.sln"
}, {
"name": "teamcity.step.mode",
"value": "default"
}]
}
}]
}
}]
}
}]
}
My Solution till now
And my code snippet till now
import json
with open('UnArchivedBuilds.txt') as api_call:
read_content = json.load(api_call)
#for project in read_content['project']:
# print (project.get('buildTypes'))
for project in read_content['project']:
# print (project['id'])
print (project['buildTypes']['buildType'])
I am not able to decide on the hierarchy of the JSON to print the relevant data (i.e id and projectName) wherever msbuild_version is less than 15.0

I had a look at your JSON data which was broken. In order to work with the snippet you provided I fixed the malformed data and removed unneeded parts to decrease clutter:
{
"project": [{
"buildTypes": {
"buildType": [{
"id": "AIntegration_BTool_BToolBuilds_DraftBuild",
"projectName": "A Integration / B Tool / VAR Builds"
},
{
"id": "AIntegration_BTool_BToolBuilds_ContinuousBuildIntegration",
"projectName": "A Integration / B Tool / VAR Builds"
}]
}
}]
}
As per my comment above I suggested recursion or using a schema validator to boil the JSON data down. However, as a quick-and-dirty approach and due to the weird structure of your dataset, I decided to do a multi-iteration over some parts of your data. Iterating over the same data is quite ugly and considered to be bad practice in most cases.
Assuming the data is stored in a file called input.json, the following snippet should give you the desired output:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
with open('input.json') as f:
data = json.load(f)
projects = (element for element in data.get('project'))
build_types = (element.get('buildTypes') for element in projects)
build_types = (element.get('buildType') for element in build_types)
for item in build_types:
for element in item:
identifier = element.get('id')
project_name = element.get('projectName')
print('{} --> {}'.format(identifier, project_name))
Printing:
AIntegration_BTool_BToolBuilds_DraftBuild --> A Integration / B Tool / VAR Builds
AIntegration_BTool_BToolBuilds_ContinuousBuildIntegration --> A Integration / B Tool / VAR Builds

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to pull data from json using python - python

json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is. For example, the only "operational status" I see in there can be read with the following: json['response']['context'][0]['attributes'][0]['operational-status']

Related

A Python script that can navigate a .json and export a .csv based on a search term

How to browse and get only json position 0 in python [duplicate]

How to read fields without numeric index in JSON

How to add data to a topic using AvroProducer

Use Python to iterate through nested JSON data

Categories

Resources