Parsing Json extracting key value python

Parsing Json extracting key value python - python

Hi guys I am trying to extract the same key but with different values over a long JSON response, but i keep getting :
KeyError: 'id'
Not sure what i am doing wrong, but i am accessing it using REST API:
This is what i have as a script :
from requests.auth import HTTPBasicAuth
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def countries():
data = requests.get("https://10.24.21.4:8543/api/netim/v1/countries/", verify=False, auth=HTTPBasicAuth("admin", "admin"))
rep = data.json()
for cid in rep:
cid = rep["id"]
print(cid)
countries()
The response is rather long, but it is like this, you will see "id", and i need the respective values :
{
"items": [
{
"name": "Afghanistan",
"displayName": "Afghanistan",
"meta": {
"type": "COUNTRY"
},
"id": "AF",
"links": {
"self": {
"path": "/api/netim/v1/countries/AF"
}
}
},
{
"name": "Albania",
"displayName": "Albania",
"meta": {
"type": "COUNTRY"
},
"id": "AL",
"links": {
"self": {
"path": "/api/netim/v1/countries/AL"
}
}
},
{
"name": "Algeria",
"displayName": "Algeria",
"meta": {
"type": "COUNTRY"
},
"id": "DZ",
"links": {
"self": {
"path": "/api/netim/v1/countries/DZ"
}
}
},
{
"name": "American Samoa",
"displayName": "American Samoa",
"meta": {
"type

I rewrote your functions a little, You should now be able to get all teh IDs from the JSON response. I suggest you look into teh basics of Dictionaries and Lists in Python
from requests.auth import HTTPBasicAuth
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def countries():
data = requests.get("https://10.24.21.4:8543/api/netim/v1/countries/", verify=False, auth=HTTPBasicAuth("admin", "admin"))
rep = data.json()
return [elem.get("id","") for elem in rep['items']]
countries()
Update:
If you wish to extract the value of the "path" key and simultaneously the value of the "id" key, You would need a list of dictionaries where every dictionary corresponds to a single record from the json.
the modified function is as follows:
def countries():
data = requests.get("https://10.24.21.4:8543/api/netim/v1/countries/", verify=False, auth=HTTPBasicAuth("admin", "admin"))
rep = data.json()
return [{"id":elem.get("id",""),"path":elem["links"]["self"]["path"]} for elem in rep['items']]
the get() returns a default value in case the key is absent in the dictionary. The function, new as well as the previous one, would not fail in case the values were not returned in the JSON response for the id and path keys
If you are sure that the value of links will always be available you can use the above function directly else you will have to write a custom function that would parse the key links and return an empty string if it is empty in the json

The response is not an array, it's a dictionary.
You want the "items" element of that dictionary:
for cid in rep['items']:

Related

how to retrieve data from json file using python

I'm doing api requests to get json file to be parsed and converted into data frames. Json file sometimes may have empty fields, I am posting 2 possible cases where 1st json fill have the field I am looking for and the 2nd json file has that field empty.
1st json file:
print(resp2)
{
"entityId": "proc_1234",
"displayName": "oracle12",
"firstSeenTms": 1639034760000,
"lastSeenTms": 1650386100000,
"properties": {
"detectedName": "oracle.sysman.gcagent.tmmain.TMMain",
"bitness": "64",
"jvmVendor": "IBM",
"metadata": [
{
"key": "COMMAND_LINE_ARGS",
"value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0"
},
{
"key": "EXE_NAME",
"value": "java"
},
{
"key": "EXE_PATH",
"value": "/usr/local/oracle/oem/agent*c/agent_*/oracle_common/jdk/bin/java"
},
{
"key": "JAVA_MAIN_CLASS",
"value": "oracle.sysman.gcagent.tmmain.TMMain"
},
{
"key": "EXE_PATH",
"value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0/oracle_common/jdk/bin/java"
}
]
}
}
2nd Json file:
print(resp2)
{
"entityId": "PROCESS_GROUP_INSTANCE-FB8C65551916D57D",
"displayName": "Windows System",
"firstSeenTms": 1619147697131,
"lastSeenTms": 1653404640000,
"properties": {
"detectedName": "Windows System",
"bitness": "32",
"metadata": [],
"awsNameTag": "Windows System",
"softwareTechnologies": [
{
"type": "WINDOWS_SYSTEM"
}
],
"processType": "WINDOWS_SYSTEM"
}
}
as you can see metadata": [] empty.
I need to extract entityId, detectedName and if metada has data, I need to get EXE_NAME and EXE_PATH. if metada section is empty, I still need to get the entityId and detectedName from this json file and form a data frame.
so, I have done this:
#retrieve the detecteName value from the json
det_name = list(resp2.get('properties','detectedName').values())[0]
#retrieve EXE_NAME, EXE_PATH and entityId from the json. This part works when metata section has data
Procdf=(pd.json_normalize(resp2, record_path=['properties', 'metadata'], meta=['entityId']).drop_duplicates(subset=['key']).query("key in ['EXE_NAME','EXE_PATH']").assign(detectedName=det_name).pivot('entityId', 'key', 'value').reset_index())
#Add detectedName to the Procdf data frame
Procdf["detectedName"] = det_name
this above code snippet works when metadata has data, if it has no data [], I still need to create a data frame with entityId, detectedName and EXE_NAME and EXE_PATH being empty.
how can I do this? Right now when metadat[], I get this error name 'key' is not defined and skipps that json.

Why not create a new dict based on whether there's value for metadata or not?
Here's an example (this should work with both response types):
import pandas as pd
def find_value(response: dict, key: str) -> str:
result = []
try:
for x in response['properties']['metadata']:
if x['key'] == key:
result.append(x['value'])
except KeyError:
return ""
return result[0] if result else ""
def get_values(response: dict) -> dict:
return {
"entityId": response['entityId'],
"displayName": response['displayName'],
"EXE_NAME": find_value(response, 'EXE_NAME'),
"EXE_PATH": find_value(response, 'EXE_PATH'),
}
sample_response = {
"entityId": "PROCESS_GROUP_INSTANCE-FB8C65551916D57D",
"displayName": "Windows System",
"firstSeenTms": 1619147697131,
"lastSeenTms": 1653404640000,
"properties": {
"detectedName": "Windows System",
"bitness": "32",
"awsNameTag": "Windows System",
"metadata": [],
"softwareTechnologies": [
{
"type": "WINDOWS_SYSTEM"
}
],
"processType": "WINDOWS_SYSTEM"
}
}
print(pd.json_normalize(get_values(sample_response)))
Sample output for metadata being empty:
entityId displayName EXE_NAME EXE_PATH
0 PROCESS_GROUP_INSTANCE-FB8C65551916D57D Windows System
And one when metadata carries, well, data:
entityId ... EXE_PATH
0 proc_1234 ... /usr/local/oracle/oem/agent*c/agent_*/oracle_c...

How to get a nested JSON object from a REST response in Python

I'm making a REST call using the Python requests library as such:
response = requests.get(...)
data = response.json()
The JSON returned is complex with lots of nested objects. Here is a summary:
{
"links": [
{
"rel": "self",
"href": "http://pseudo.com/iam/governance/selfservice/api/v1/accounts"
}
],
"accounts": [
{
"links": [
{
"rel": "self",
"href": "http://pseudo.com/iam/governance/selfservice/api/v1/accounts"
}
],
"accountId": "73",
"userId": "1005",
"appInstanceId": "1",
"requestId": "",
"status": "Provisioning",
"accountType": "Unknown",
"policyKey": "",
"processInstanceKey": "201",
"provisionedBy": "1",
"provisionedByMechanism": "Direct Provision",
"provisionedOnDate": "2016-03-22",
"riskSummary": 0,
"accountDescription": "201",
"validFromDate": "2016-03-22",
"normalizeData": {
},
"accountData": {
}
}
]
}
The only library I have imported thus far is import requests.
How can I retrieve the value from the key "accountId" from the response above?
Is this possible with only the requests library or do I need to import the json library too?

data['accounts'] is list of dicts, so you want to iterate over it or use index to acces specific account.
for acc in data['accounts']:
print(acc['accountId']) # or print(acc.get('accountId'))
Working with JSON objects and arrays is no different from working with dicts and list - that is what they are parsed into.

try this:
account_id = data['accounts'][0]['accountId']
if there could be multiple accounts and some might not have an accountId, you can try this:
account_id = next((account['accountId'] for account in data['accounts'] if 'accountId' in account),None)
if you want to get all accountIds from the accounts, then try this:
account_ids = [account.get('accountId') for account in data['accounts
...: ']]

Extract data from API call and save file

after making post call of API I want to extract specific key/value and than save it onto a text file.
What have been done so far:-
(1)Rest api call and return list
import requests
import json
#API details
url = "http://192.168.1.100:9792/api/scan"
body = json.dumps({"service":"scan", "user_id":"1", "action":"read_all", "code":"0"})
headers = {'Content-Type': 'application/json'}
#Making http post request
response = requests.post(url, headers=headers, data=body, verify=False)
#Decode response.json() method to a python dictionary for data process utilization
dictData = response.json()
#Json string
json_str = json.dumps(dictData)
print(json_str)
print(json_str) output as below
{
"node": [
{
"id": "123",
"ip": "10.101.10.1",
"model": "md1",
"type": "basic",
"name": "aaa"
},
{
"id": "456",
"ip": "10.101.10.2",
"model": "sp3",
"type": "extra",
"name": "bbb"
},
{
"id": "789",
"ip": "1.1.1.1",
"model": "md1",
"type": "basic",
"name": "ccc"
},
{
"id": "101",
"ip": "2.2.2.2",
"model": "advance",
"type": "sw1",
"name": "ddd"
}
],
"status": "success"
}
(2)Extract specific key/value, This is where I'm getting error to get the key/value from the list
for i in json_str["node"]:
if i["type"]=="basic" or i["type"]=="sw1" :
print(i["name"],i["ip"], i["type"])
I'm getting error
for i in json_str["node"]:
TypeError: string indices must be integers, not str
I tried change to json_str[0] but it still doesn't return the key/value that i want.
Please assist further. thanks
Just use back dictData as it already in dictionary
for i in dictData["node"]

You dumped the json into str
Again to work with dict first load the json and then try
json_str = json.dumps(dictData)
json_dict = json.loads(json_str)
print(json_dict)

Accessing nested json objects using python

I am trying to interact with an API and running into issues accessing nested objects. Below is sample json output that I am working with.
{
"results": [
{
"task_id": "22774853-2b2c-49f4-b044-2d053141b635",
"params": {
"type": "host",
"target": "54.243.80.16",
"source": "malware_analysis"
},
"v": "2.0.2",
"status": "success",
"time": 227,
"data": {
"details": {
"as_owner": "Amazon.com, Inc.",
"asn": "14618",
"country": "US",
"detected_urls": [],
"resolutions": [
{
"hostname": "bumbleride.com",
"last_resolved": "2016-09-15 00:00:00"
},
{
"hostname": "chilitechnology.com",
"last_resolved": "2016-09-16 00:00:00"
}
],
"response_code": 1,
"verbose_msg": "IP address in dataset"
},
"match": true
}
}
]
}
The deepest I am able to access is the data portion which returns too much.... ideally I am just trying access as_owner,asn,country,detected_urls,resolutions
When I try to access details / response code ... etc I will get a KeyError. My nested json goes deeper then other Q's mentioned and I have tried that logic.
Below is my current code snippet and any help is appreciated!
import requests
import json
headers = {
'Content-Type': 'application/json',
}
params = (
('wait', 'true'),
)
data = '{"target":{"one":{"type": "ip","target": "54.243.80.16", "sources": ["xxx","xxxxx"]}}}'
r=requests.post('https://fakewebsite:8000/api/services/intel/lookup/jobs', headers=headers, params=params, data=data, auth=('apikey', ''))
parsed_json = json.loads(r.text)
#results = parsed_json["results"]
for item in parsed_json["results"]:
print(item['data'])

You just need to index correctly into the converted JSON. Then you can easily loop over a list of the keys you want to fetch, since they are all in the "details" dictionary.
import json
raw = '''\
{
"results": [
{
"task_id": "22774853-2b2c-49f4-b044-2d053141b635",
"params": {
"type": "host",
"target": "54.243.80.16",
"source": "malware_analysis"
},
"v": "2.0.2",
"status": "success",
"time": 227,
"data": {
"details": {
"as_owner": "Amazon.com, Inc.",
"asn": "14618",
"country": "US",
"detected_urls": [],
"resolutions": [
{
"hostname": "bumbleride.com",
"last_resolved": "2016-09-15 00:00:00"
},
{
"hostname": "chilitechnology.com",
"last_resolved": "2016-09-16 00:00:00"
}
],
"response_code": 1,
"verbose_msg": "IP address in dataset"
},
"match": true
}
}
]
}
'''
parsed_json = json.loads(raw)
wanted = ['as_owner', 'asn', 'country', 'detected_urls', 'resolutions']
for item in parsed_json["results"]:
details = item['data']['details']
for key in wanted:
print(key, ':', json.dumps(details[key], indent=4))
# Put a blank line at the end of the details for each item
print()
output
as_owner : "Amazon.com, Inc."
asn : "14618"
country : "US"
detected_urls : []
resolutions : [
{
"hostname": "bumbleride.com",
"last_resolved": "2016-09-15 00:00:00"
},
{
"hostname": "chilitechnology.com",
"last_resolved": "2016-09-16 00:00:00"
}
]
BTW, when you fetch JSON data using requests there's no need to use json.loads: you can access the converted JSON using the .json method of the returned request object instead of using its .text attribute.
Here's a more robust version of the main loop of the above code. It simply ignores any missing keys. I didn't post this code earlier because the extra if tests make it slightly less efficient, and I didn't know that keys could be missing.
for item in parsed_json["results"]:
if not 'data' in item:
continue
data = item['data']
if not 'details' in data:
continue
details = data['details']
for key in wanted:
if key in details:
print(key, ':', json.dumps(details[key], indent=4))
# Put a blank line at the end of the details for each item
print()

Why is the json returned by the Advanced REST Client different than that returned by the Requests module in Python?

ARC:
https://chrome.google.com/webstore/detail/advanced-rest-client/hgmloofddffdnphfgcellkdfbfbjeloo?hl=en-US
I saved the returned json in a .json file and transformed it into a pandas dataframe using:
temp_json = pd.read_json('TempJson.json', orient='columns')
This works great.
But then I used the requests module in Python 2.7.13, specifically:
myResponse = requests.post(url, json= payload, headers = headers)
jData = json.loads(myResponse.content)
And 1) the json structure is much different than temp_json and 2) it completely wrecks my code. Any idea why?
Snippet from temp_json:
{
"expand": "schema,names",
"startAt": 0,
"maxResults": 250,
"total": 3,
"issues": [
{
"expand": "operations,editmeta,changelog,transitions,renderedFields",
"id": "1954523",
"key": "SPGC-14075",
"fields": {"summary": "QA: Build concentration support into CDC automation",
"issuetype": {
"self": "https://itec-jira.fmr.com/rest/api/2/issuetype/20",
"id": "20",
"description": "Default sub-task",
"iconUrl": "https://itec-
jira.fmr.com/images/icons/issuetypes/subtask_alternate.png",
"name": "Sub task",
"subtask": true
Sample from python json:
{
"issues": [
{
"key": "SPGC-25646",
"fields": {
"status": {
"statusCategory": {
"name": "To Do",
"self": "https://itec-jira.fmr.com/rest/api/2/statuscategory/2",
"id": 2,
"key": "new",
"colorName": "blue-gray"
},.....

json.loads will create a python dict which is hashed so the contents will be in scrambled order. Check that json.loads returns the same dict for both the request and the tempfile. If they are different then the data is different. You can use the pretty print library to help you debug deep nested json.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing Json extracting key value python - python

The response is not an array, it's a dictionary. You want the "items" element of that dictionary: for cid in rep['items']:

Related

how to retrieve data from json file using python

How to get a nested JSON object from a REST response in Python

Extract data from API call and save file

Accessing nested json objects using python

Why is the json returned by the Advanced REST Client different than that returned by the Requests module in Python?

Categories

Resources