Extracting data from JSON log

Extracting data from JSON log - python

I am a beginner when it comes to programming. I'm trying to extract elements from a JSON log file, but I get an error and I don't know how to deal with it.
import json
with open("/Users/milosz/Desktop/logi.json") as f:
data = json.load(f)
print(type(data['Objects']))
print(data)
for object in data ['Objects']:
print(object)
Error:
File "/Users/milosz/PycharmProjects/JsonDataExtracter/Program/Python Exracter.py", line 4, in <module>
print(type(data['Objects']))
TypeError: list indices must be integers or slices, not str
Process finished with exit code 1
I am sending the log below.
{
"_id": "635bd4bfc594743ce9b1a5a3",
"dateStart": "2022-10-28T13:09:28.609Z",
"dateFinish": "2022-10-28T13:10:23.698Z",
"method": "customer.file.upsert",
"request": {
"Objects": [
{
"ERPId": "6915",
"B24Id": 403772,
"FileName": "B2B000202",
"FileContent": "JVBERi0xLjMNJeLjz9MN",
"B24EntityId": 3334
}
]

Following up on the guidance from #accdias, here is a code snippet that closes the gaps in your JSON snippet and demonstrates how to access the Objects section:
import json
json_string = """
{
"_id": "635bd4bfc594743ce9b1a5a3",
"dateStart": "2022-10-28T13:09:28.609Z",
"dateFinish": "2022-10-28T13:10:23.698Z",
"method": "customer.file.upsert",
"request": {
"Objects": [
{
"ERPId": "6915",
"B24Id": 403772,
"FileName": "B2B000202",
"FileContent": "JVBERi0xLjMNJeLjz9MN",
"B24EntityId": 3334
}
]
}
}
"""
json_dict = json.loads(json_string)
print(json_dict["request"]["Objects"])
Output:
[{'ERPId': '6915', 'B24Id': 403772, 'FileName': 'B2B000202', 'FileContent': 'JVBERi0xLjMNJeLjz9MN', 'B24EntityId': 3334}]

Related

How to print out the exact field/string of the JSON output?

I'm trying to filter all the result that I got in the GET Request.
The Output that I want is just to get the summary: , key: and self:.
But I'm getting a lot of Json data.
I've tried googling on how to do this and I'm going to nowhere.
Here is my code:
The commented lines are the codes that I have tried.
import requests
import json
import re
import sys
url ="--------"
auth='i.g--t----------', 'X4------'
r = requests.get(url, auth=(auth))
data = r.json()
#print( json.dumps(data, indent=2) )
#res1 = " ".join(re.split("summary", data))
#print ("first string result: ", str(res1))
#json_str = json.dumps(data)
#resp = json.loads(json_str)
#print (resp['id'])
#resp_dict = json.loads(resp_str)
#resp_dict.get('name')
#print('dasdasd', json_str["summary"])
Example of the Get Api Output that I'm getting using this code. print( json.dumps(data, indent=2) )
{
"id": "65621",
"self": "https://bboxxltd.atlassian.net/rest/api/2/issue/65621",
"key": "CMS-5901",
"fields": {
"summary": "new starter: Edoardo Bologna",
"customfield_10700": [
{
"id": "2",
"name": "BBOXX Rwanda HQ",
"_links": {
"self": "https://bboxxltd.atlassian.net/rest/servicedeskapi/organization/2"
}
}
},
"inwardIssue": {
"id": "65862",
"key": "BMT-2890",
"self": "https://bboxxltd.atlassian.net/rest/api/2/issue/65862",
"fields": {
"summary": "ERP Databases access with Read Only",
"status": {
"self": "https://bboxxltd.atlassian.net/rest/api/2/status/10000",
"description": "",
"iconUrl": "https://bboxxltd.atlassian.net/",
"name": "To Do",
"id": "10000",
"statusCategory": {
"self": "https://bboxxltd.atlassian.net/rest/api/2/statuscategory/2",
"id": 2,
"key": "new",
"colorName": "blue-gray",
"name": "To Do"
}
},
"priority": {
"self": "https://bboxxltd.atlassian.net/rest/api/2/priority/4",
"iconUrl": "https://bboxxltd.atlassian.net/images/icons/priorities/low.svg",
"name": "Low",
My error is:
Traceback (most recent call last):
File "c:/Users/IanJayloG/Desktop/Python Files/Ex_Files_Learning_Python/Exercise Files/Test/Untitled-1.py", line 17, in <module>
print('dasdasd', data["summary"])
KeyError: 'summary'
PS C:\Users\IanJayloG\Desktop\Python Files\Ex_Files_Learning_Python\Exercise Files> & C:/Users/IanJayloG/AppData/Local/Programs/Python/Python37-32/python.exe "c:/Users/IanJayloG/Desktop/Python Files/Ex_Files_Learning_Python/Exercise Files/Test/Untitled-1.py"
Traceback (most recent call last):
File "c:/Users/IanJayloG/Desktop/Python Files/Ex_Files_Learning_Python/Exercise Files/Test/Untitled-1.py", line 17, in <module>
print('dasdasd', json_str["summary"])
TypeError: string indices must be integers

The problem about your error message
print('dasdasd', json_str["summary"])
TypeError: string indices must be integers
is that you try to access the named field summary on a string (variable json_str), which does not work because strings don't have fields to access by name. If you use the indexing [] operator on a string, you can only provide integers or ranges to extract single characters or sequences from that string. This is obviously not what you're intending.
The keys self and key are on top level of your JSON document, whereas summary is under fields. This should do it, without any extra transformation applied:
import requests
r = requests.get(url, auth=(auth))
data = r.json()
data_summary = data['fields']['summary']
data_self = data['self']
data_key = data['key']

Dictionary length is equal to 3 but when trying to access an index receiving KeyError

I am attempting to parse a json response that looks like this:
{
"links": {
"next": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-08&end_date=2015-09-09&detailed=false&api_key=xxx",
"prev": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-06&end_date=2015-09-07&detailed=false&api_key=xxx",
"self": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-07&end_date=2015-09-08&detailed=false&api_key=xxx"
},
"element_count": 22,
"near_earth_objects": {
"2015-09-08": [
{
"links": {
"self": "http://www.neowsapp.com/rest/v1/neo/3726710?api_key=xxx"
},
"id": "3726710",
"neo_reference_id": "3726710",
"name": "(2015 RC)",
"nasa_jpl_url": "http://ssd.jpl.nasa.gov/sbdb.cgi?sstr=3726710",
"absolute_magnitude_h": 24.3,
"estimated_diameter": {
"kilometers": {
"estimated_diameter_min": 0.0366906138,
"estimated_diameter_max": 0.0820427065
},
"meters": {
"estimated_diameter_min": 36.6906137531,
"estimated_diameter_max": 82.0427064882
},
"miles": {
"estimated_diameter_min": 0.0227984834,
"estimated_diameter_max": 0.0509789586
},
"feet": {
"estimated_diameter_min": 120.3760332259,
"estimated_diameter_max": 269.1689931548
}
},
"is_potentially_hazardous_asteroid": false,
"close_approach_data": [
{
"close_approach_date": "2015-09-08",
"close_approach_date_full": "2015-Sep-08 09:45",
"epoch_date_close_approach": 1441705500000,
"relative_velocity": {
"kilometers_per_second": "19.4850295284",
"kilometers_per_hour": "70146.106302123",
"miles_per_hour": "43586.0625520053"
},
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
],
"is_sentry_object": false
},
}
I am trying to figure out how to parse through to get "miss_distance" dictionary values ? I am unable to wrap my head around it.
Here is what I have been able to do so far:
After I get a Response object from request.get()
response = request.get(url
I convert the response object to json object
data = response.json() #this returns dictionary object
I try to parse the first level of the dictionary:
for i in data:
if i == "near_earth_objects":
dataset1 = data["near_earth_objects"]["2015-09-08"]
#this returns the next object which is of type list
Please someone can explain me :
1. How to decipher this response in the first place.
2. How can I move forward in parsing the response object and get to miss_distance dictionary ?
Please any pointers/help is appreciated.
Thank you

Your data will will have multiple dictionaries for the each date, near earth object, and close approach:
near_earth_objects = data['near_earth_objects']
for date in near_earth_objects:
objects = near_earth_objects[date]
for object in objects:
close_approach_data = object['close_approach_data']
for close_approach in close_approach_data:
print(close_approach['miss_distance'])

The code below gives you a table of date, miss_distances for every object for every date
import json
raw_json = '''
{
"near_earth_objects": {
"2015-09-08": [
{
"close_approach_data": [
{
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
]
}
]
}
}
'''
if __name__ == "__main__":
parsed = json.loads(raw_json)
# assuming this json includes more than one near_earch_object spread across dates
near_objects = []
for date, near_objs in parsed['near_earth_objects'].items():
for obj in near_objs:
for appr in obj['close_approach_data']:
o = {
'date': date,
'miss_distances': appr['miss_distance']
}
near_objects.append(o)
print(near_objects)
output:
[
{'date': '2015-09-08',
'miss_distances': {
'astronomical': '0.0269230459',
'lunar': '10.4730648551',
'kilometers': '4027630.320552233',
'miles': '2502653.4316094954'
}
}
]

Update arrays in json object

Tried to update the array in json object. Here is my json object
{
"api.version": "v1",
"source": {
"thirdPartyRef": {
"resources": [{
"serviceType": "AwsElbBucket",
"path": {
"pathExpression": "songs/*"
},
"authentication": {
"type": "S3BucketAuthentication"
}
}]
}
}
}
Code that reads json and update awsId. My requirement is to add aws creds int the authentication secition.
Once program run successfully, it should look like
"authentication": {
"type": "S3BucketAuthentication",
"awsId": "AKIAXXXXX",
"awsKey": "MYHSHSYjusXXX"
}
Here is my snippet of code args[5] is the jsonfile
with open(args[5]) as json_data:
source = json.loads(json_data.read())
# source['source']['category']['awsID'] = "test"
source.update( {"awsId" : "AKIAXXXXX", "awsKey": "HHSJSHS"})
print source
output:
{u'api.version': u'v1', 'awsKey': 'HHSJSHS', 'awsId': 'AKIAXXXXX', u'source': {u'thirdPartyRef': {u'resources': [{u'path': {u'pathExpression': u'songs/*'}, u'serviceType': u'AwsElbBucket', u'authentication': {u'type': u'S3BucketAuthentication'}}]}}}
I tried to source.update( "source":{"awsId" : "AKIAXXXXX", "awsKey": "HHSJSHS"}}), it overwrites the rest of the json.

The data structure that you want to update is buried fairly deeply. You can't access it from the very top level.
Try this:
import json
with open('arg5.json') as json_data:
source = json.loads(json_data.read())
print source
source["source"]["thirdPartyRef"]["resources"][0]["authentication"].update(
{"awsId" : "AKIAXXXXX", "awsKey": "HHSJSHS"})

Reformat non-serializable JSON-ish data into a format suitable for value extraction in Python

With the following simple Python script:
import json
file = 'toy.json'
data = json.loads(file)
print(data['gas']) # example
My data generates the error ...is not JSON serializable.
With this, slightly more sophisticated, Python script:
import json
import sys
#load the data into an element
data = open('transactions000000000029.json', 'r')
#dumps the json object into an element
json_str = json.dumps(data)
#load the json to a string
resp = json.loads(json_str)
#extract an element in the response
print(resp['gas'])
The same.
What I'd like to do is extract all the values of a particular index, so ideally I'd like to render the input like so:
...
"hash": "0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63",
"gasUsed": "21000",
"hash": "0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26"
"gasUsed": "21000"
...
The data looks like this:
{
"blockNumber": "1941794",
"blockHash": "0x41ee74e34cbf9ef4116febea958dbc260e2da3a6bf6f601bfaeb2cd9ab944a29",
"hash": "0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63",
"from": "0x3c0cbb196e3847d40cb4d77d7dd3b386222998d9",
"to": "0x2ba24c66cbff0bda0e3053ea07325479b3ed1393",
"gas": "121000",
"gasUsed": "21000",
"gasPrice": "20000000000",
"input": "",
"logs": [],
"nonce": "14",
"value": "0x24406420d09ce7440000",
"timestamp": "2016-07-24 20:28:11 UTC"
}
{
"blockNumber": "1941716",
"blockHash": "0x75e1602cad967a781f4a2ea9e19c97405fe1acaa8b9ad333fb7288d98f7b49e3",
"hash": "0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26",
"from": "0xa0480c6f402b036e33e46f993d9c7b93913e7461",
"to": "0xb2ea1f1f997365d1036dd6f00c51b361e9a3f351",
"gas": "121000",
"gasUsed": "21000",
"gasPrice": "20000000000",
"input": "",
"logs": [],
"nonce": "1",
"value": "0xde0b6b3a7640000",
"timestamp": "2016-07-24 20:12:17 UTC"
}
What would be the best way to achieve that?
I've been thinking that perhaps the best way would be to reformat it as valid json?
Or maybe to just treat it like regex?

Your json file is not valid. This data should be a list of dictionaries. You should then separate each dictionary with a comma, Like this:
[
{
"blockNumber":"1941794",
"blockHash": "0x41ee74bf9ef411d9ab944a29",
"hash":"0xf2ef9daf63",
"from":"0x3c0cbb196e3847d40cb4d77d7dd3b386222998d9",
"to":"0x2ba24c66cbff0bda0e3053ea07325479b3ed1393",
"gas":"121000",
"gasUsed":"21000",
"gasPrice":"20000000000",
"input":"",
"logs":[
],
"nonce":"14",
"value":"0x24406420d09ce7440000",
"timestamp":"2016-07-24 20:28:11 UTC"
},
{
"blockNumber":"1941716",
"blockHash":"0x75e1602ca8d98f7b49e3",
"hash":"0xf8f2a397b0f7bb1ff212e193c0252fab26",
"from":"0xa0480c6f402b036e33e46f993d9c7b93913e7461",
"to":"0xb2ea1f1f997365d1036dd6f00c51b361e9a3f351",
"gas":"121000",
"gasUsed":"21000",
"gasPrice":"20000000000",
"input":"",
"logs":[
],
"nonce":"1",
"value":"0xde0b6b3a7640000",
"timestamp":"2016-07-24 20:12:17 UTC"
}
]
Then use this to open the file:
with open('toy.json') as data_file:
data = json.load(data_file)
You can then render the desired output like:
for item in data:
print item['hash']
print item['gasUsed']

If each block is valid JSON data you can parse them seperatly:
data = []
with open('transactions000000000029.json') as inpt:
lines = []
for line in inpt:
if line.startswith('{'): # block starts
lines = [line]
else:
lines.append(line)
if line.startswith('}'): # block ends
data.append(json.loads(''.join(lines)))
for block in data:
print("hash: {}".format(block['hash']))
print("gasUsed: {}".format(block['gasUsed']))

Loading JSON file for reading and selecting data

I have a json file that I load into python. I want to take a keyword from the file (which is very big), like country rank or review from info taken from the internet. I tried
json.load('filename.json')
but I am getting an error:
AttributeError: 'str' object has no attribute 'read.'
What am I doing wrong?
Additionally, how do I select part of a json file if it is very big?

I think you need to open the file then pass that to json load like this
import json
from pprint import pprint
with open('filename.json') as data:
output = json.load(data)
pprint(output)

Try the following:
import json
json_data_file = open("json_file_path", 'r').read() # r for reading the file
json_data = json.loads(json_data_file)
Access the data using the keys as follows :
json_data['key']

json.load() expects the file handle after it has been opened:
with open('filename.json') as datafile:
data = json.load(datafile)
For example if your json data looked like this:
{
"maps": [
{
"id": "blabla",
"iscategorical": "0"
},
{
"id": "blabla",
"iscategorical": "0"
}
],
"masks": {
"id": "valore"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}
To access parts of the data, use:
data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]
That code can be found in this SO answer:
Parsing values from a JSON file using Python?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting data from JSON log - python

Related

How to print out the exact field/string of the JSON output?

Dictionary length is equal to 3 but when trying to access an index receiving KeyError

Update arrays in json object

Reformat non-serializable JSON-ish data into a format suitable for value extraction in Python

Loading JSON file for reading and selecting data

Categories

Resources