Python Interprter error while loading JSON file using json.load() - python

This is my python code for parsing a JSON file.
import os
import argparse
import json
import datetime
ResultsJson = "sample.json"
try:
with open(ResultsJson, 'r') as j:
jsonbuffer = json.load(j)
result_data = json.loads(jsonbuffer)
print("Just after loading json")
except Exception as e:
print(e, exc_info=True)
I get an error like in the snapshot attached below.
I'm also attaching the JSON file "sample.json" that I'm using here.
sample.json
{
"idx": 1,
"timestamp": 1562781093.1182132,
"machine_id": "tool_2",
"part_id": "af71ce94-e9b2-47c0-ab47-a82600616b6d",
"image_id": "14cfb9e9-1f38-4126-821b-284d7584b739",
"cam_sn": "camera-serial-number",
"defects": [
{
"type": 0,
"tl": [169, 776],
"br": [207, 799]
},
{
"type": 0,
"tl": [404, 224],
"br": [475, 228]
},
{
"type": 1,
"tl": [81, 765],
"br": [130, 782]
}
],
"display_info": [
{
"info": "DEFECT DETECTED",
"priority": 2
}
]
}
Not sure what I missed here. I'm very new to Python (Coming from C++ background). Please be easy on me if I've missed something basic.

You don't need this line:
result_data = json.loads(jsonbuffer)
...because jsonbuffer is the result of json.load, so it's already the result of parsing the JSON file. In your case it's a Python dictionary, but json.loads expects a string, so you get an error.
Also, as the second error message says, exc_info is not a valid keyword argument of the print function. If you wanted to print the exception, just do print(e).

You can do either:
with open(ResultsJson, 'r') as j:
result_data = json.load(j)
print("Just after loading json")
Or:
with open(ResultsJson, 'r') as j:
result_data = json.loads(j.read())
print("Just after loading json")
The json.load() internally calls the json.loads() function

Related

Is there a way to normalize a json pulled straight from an api

Here is the type of json file that I am working with
{
"header": {
"gtfsRealtimeVersion": "1.0",
"incrementality": "FULL_DATASET",
"timestamp": "1656447045"
},
"entity": [
{
"id": "RTVP:T:16763243",
"isDeleted": false,
"vehicle": {
"trip": {
"tripId": "16763243",
"scheduleRelationship": "SCHEDULED"
},
"position": {
"latitude": 33.497833,
"longitude": -112.07365,
"bearing": 0.0,
"odometer": 16512.0,
"speed": 1.78816
},
"currentStopSequence": 16,
"currentStatus": "INCOMING_AT",
"timestamp": "1656447033",
"stopId": "2792",
"vehicle": {
"id": "5074"
}
}
},
{
"id": "RTVP:T:16763242",
"isDeleted": false,
"vehicle": {
"trip": {
"tripId": "16763242",
"scheduleRelationship": "SCHEDULED"
},
"position": {
"latitude": 33.562374,
"longitude": -112.07392,
"bearing": 359.0,
"odometer": 40367.0,
"speed": 15.6464
},
"currentStopSequence": 36,
"currentStatus": "INCOMING_AT",
"timestamp": "1656447024",
"stopId": "2794",
"vehicle": {
"id": "5251"
}
}
}
]
}
In my code, I am taking in the json as a string. But when I try normalize json string to put into data frame
import pandas as pd
import json
import requests
base_URL = requests.get('https://app.mecatran.com/utw/ws/gtfsfeed/vehicles/valleymetro?apiKey=4f22263f69671d7f49726c3011333e527368211f&asJson=true')
packages_json = base_URL.json()
packages_str = json.dumps(packages_json, indent=1)
df = pd.json_normalize(packages_str)
I get this error, I am definitely making some rookie error, but how exactly am I using this wrong? Are there additional arguments that may need in that?
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-33-aa23f9157eac> in <module>()
8 packages_str = json.dumps(packages_json, indent=1)
9
---> 10 df = pd.json_normalize(packages_str)
/usr/local/lib/python3.7/dist-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
421 data = list(data)
422 else:
--> 423 raise NotImplementedError
424
425 # check to see if a simple recursive function is possible to
NotImplementedError:
When I had the json format within my code without the header portion referenced as an object, the pd.json_normalize(package_str) does work, why would that be, and what additional things would I need to do?
The issue is, that pandas.json_normalize expects either a dictionary or a list of dictionaries but json.dumps returns a string.
It should work if you skip the json.dumps and directly input the json to the normalizer, like this:
import pandas as pd
import json
import requests
base_URL = requests.get('https://app.mecatran.com/utw/ws/gtfsfeed/vehicles/valleymetro?apiKey=4f22263f69671d7f49726c3011333e527368211f&asJson=true')
packages_json = base_URL.json()
df = pd.json_normalize(packages_json)
If you take a look at the corresponding source-code of pandas you can see for yourself:
if isinstance(data, list) and not data:
return DataFrame()
elif isinstance(data, dict):
# A bit of a hackjob
data = [data]
elif isinstance(data, abc.Iterable) and not isinstance(data, str):
# GH35923 Fix pd.json_normalize to not skip the first element of a
# generator input
data = list(data)
else:
raise NotImplementedError
You should find this code at the path that is shown in the stacktrace, with the error raised on line 423:
/usr/local/lib/python3.7/dist-packages/pandas/io/json/_normalize.py
I would advise you to use a code-linter or an IDE that has one included (like PyCharm for example) as this is the type of error that doesn't happen if you have one.
I m not sure where is the problem, but if you are desperate, you can always make text function that will data-mine that Json.
Yes, it will be quite tiring, but with +-10 variables you need to mine, for each row, you will be done in +-60 minutes no problem.
Something like this:
def MineJson(text, target): #target is for example "id"
#print(text)
findword = text.find(target)
g=findword+len(target)+5 #should not include the first "
new_text= text[g:] #output should be starting with RTVP:T...
return new_text
def WhatsAfter(text): #should return new text and RTVP:T:16763243
#print(text)
toFind='"'
findEnd = text.find(toFind)
g=findEnd
value=text[:g]
new_text= text[g:]
return new_text,value
I wrote it without testing, so maybe there will be some mistakes.

When I try to read a JSON data from a file (exported from xlsx) in python, it throws KeyError

I had .xlsx file, and I converted it to JSON. I am using python to get the data from this json file. I am able for example to search for Build# and then I get the corresponding level, but when I search for values in, for example, "14H0232" or "14H4812" it throws a KeyError.
'''
import json
f = open('try.json')
data = json.load(f)
input= input('Enter the value: ')
for i in data['F6']:
if i['14H0232'] == input:
print(i['LEVEL'])
f.close()
'''
A Snippet of the json file.
'''
{
"F6": [
{
"LEVEL": "2.0.6.0",
"ID": "dataID",
"Build#": "9",
"prod/dev": "prod ",
"14H4812": "data1\r\ndata2",
"14H4826": "data",
"14H4813": "data1\r\ndata2"
}
],
"F5": [
{
"LEVEL": "2.0.5.1",
"ID": "dataID",
"Build#": "18",
"prod/dev": "prod",
"14H0232": "data1: data1\r\ndata2: data2\r\ndata3: data3",
"14H12321": "data1\r\ndata2"
}
],
"F4": [
{
"LEVEL": "2.0.4.1",
"ID": "dataID",
"Build#": "18",
"prod/dev": "prod",
"14H0232": "data1: data1\r\ndata2: data2\r\ndata3: data3",
"14H12321": "data1\r\ndata2"
}
]
}
'''
The problem is in your loop. When you are trying to access the value for "14H0232" it just doesnt exist in your json file. The case for 'Build#' is different I assume, as the key is always there. The example you shared is also not showing that F6 has a key with that id you specified.
So to avoid this kind of errors you can put your 'if' statement in a try block and catch the error.
try:
if i['14H0232'] == input:
print(i['LEVEL'])
except KeyError:
print("The key is not found but the code continues to execute")

Using python sort() to sort a json file - TypeError

I've imported a json file into a python variable in order to sort it by values.
It looks something like this:
import json
def custom_sort(package):
return package['analytics']
with open('my_file.json','r') as f:
data=json.load(f)
data.sort(key=custom_sort)
This is the data in the json file:
[
{
"name": "a2ps",
"desc": "Any-to-PostScript filter",
"analytics":
{ "install_30d": 77,
"install_90d": 322,
"install_365d": 1146 }
},
{
"name": "a52dec",
"desc": "Library for decoding ATSC A/52 streams (AKA 'AC-3')",
"analytics":
{ "install_30d": 41,
"install_90d": 153,
"install_365d": 619 }
},
]
And I get the following error:
data.sort(key=custom_sort)
TypeError: '<' not supported between instances of 'dict' and 'dict'
Does anyone know why "sort()" doesn't work? The same happens with "sorted()".
(The data type of the variable "data" is a list).
Many thanks in advance!

How can I get and print specific data from a json query?

So i want to be able to pull data based on a certain condition, from this data i then want to be able to print multiple items from this query...here's what ive done so far:
def rec():
qe = JsonQ(r"C:\ShopFloor\data.json")
res = qe.at('data').where('Status', '=', 1).get()
for Process, Shortnum in res:
print(Process['Process'] + " " + Shortnum['Shortnum'])
rec()
this is from the following json file:
{
"data": [
{
"Shortnum": "34567",
"Process": "SPA",
"Status": 1,
"Start_Time": "2016-12-14 15:54:35",
"Finish_Time": "2016-12-14 15:56:02"
},
{
"Shortnum": "34567",
"Process": "Figure",
"Status": 0,
"Start_Time": "2016-12-08 15:34:05",
"Finish_Time": "2016-12-08 15:34:09"
},
How can I get this to work properly? Ideally I am looking for this kind of response from the print:
SPA 34567
cannot get a current output, i get this error... i realise I am passing too many arguments however i couldn't think of a proper way to do it...
Exception has occurred: ValueError
too many values to unpack (expected 2)
File "C:\ShopFloor\main.py", line 101, in rec
for Process, Shortnum in res:
File "C:\ShopFloor\main.py", line 106, in <module>
rec()
The typical approach to working with JSON in Python is to load the JSON object as a Python dict:
import json
with open('C:/ShopFloor/data.json', 'r') as f:
qe = json.load(f)
for item in qe['data']:
if item['Status'] == 1:
print(f'{item["Process"]} {item["Shortnum"]}')
Note this uses Python 3's f-strings (be sure to alternate single and double quotes when accessing dictionary values in an f-string). In Python 2, replace last line with:
print('{} {}'.format(item['Process'], item['Shortnum']))

Loading JSON file for reading and selecting data

I have a json file that I load into python. I want to take a keyword from the file (which is very big), like country rank or review from info taken from the internet. I tried
json.load('filename.json')
but I am getting an error:
AttributeError: 'str' object has no attribute 'read.'
What am I doing wrong?
Additionally, how do I select part of a json file if it is very big?
I think you need to open the file then pass that to json load like this
import json
from pprint import pprint
with open('filename.json') as data:
output = json.load(data)
pprint(output)
Try the following:
import json
json_data_file = open("json_file_path", 'r').read() # r for reading the file
json_data = json.loads(json_data_file)
Access the data using the keys as follows :
json_data['key']
json.load() expects the file handle after it has been opened:
with open('filename.json') as datafile:
data = json.load(datafile)
For example if your json data looked like this:
{
"maps": [
{
"id": "blabla",
"iscategorical": "0"
},
{
"id": "blabla",
"iscategorical": "0"
}
],
"masks": {
"id": "valore"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}
To access parts of the data, use:
data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]
That code can be found in this SO answer:
Parsing values from a JSON file using Python?

Categories