How do I loop through a multiple delimited JSON file over Python?

How do I loop through a multiple delimited JSON file over Python? - python

I'm facing problem to loop over multiple delimited JSON, following is my JSON file content:
[{"Timestamp":"2019-05-17T18:00:00.19+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"3","Type":"MachineInfo"}}}]]
[{"Timestamp":"2019-05-17T18:00:10.502+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"1","Type":"MachineInfo"}}}]]
[{"Timestamp":"2019-05-17T18:00:05.814+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"9","Type":"MachineInfo"}}}]]
It doesnt work, unless I did the manually adding the commas (,) after the row work as below:
[{"Timestamp":"2019-05-17T18:00:00.19+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"3","Type":"MachineInfo"}}}],
{"Timestamp":"2019-05-17T18:00:10.502+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"1","Type":"MachineInfo"}}}],
{"Timestamp":"2019-05-17T18:00:05.814+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"9","Type":"MachineInfo"}}}]]
def main():
#Read json file
f = open('/home/amirizzat/Desktop/data.json')
data = json.load(f)
f.close()
#Print json
print(data)
#call main
main()

So it appears that your file isn't exactly JSON, instead it has lines and the content of each line is JSON.
You could do something like
with open('/home/amirizzat/Desktop/data.json') as f:
data = [json.loads(line) for line in f]
print(data)
That loops over the lines and deserializes the JSON for each one, putting the results in an array.

Related

I need help creating a simple python script that stores an attribute value from a custom json file

JSON file looks like this:
{"Clear":"Pass","Email":"noname#email.com","ID":1234}
There are hundreds of json files with different email values, which is why I need a script to run against all files.
I need to extract out the value associated with the Email attribute, which is nooname#email.com.
I tried using import json but I'm getting a decoder error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Script looks like this:
import json
json_data = json.loads("file.json")
print (json_data["Email"]
Thanks!

According to the docs, json.loads() takes a str, bytes or bytearray as argument. So if you want to load a json file this way, you should pass the content of the file instead of its path.
import json
file = open("file.json", "r") # Opens file.json in read mode
file_data = file.read()
json_data = json.loads(file_data)
file.close() # Remember to close the file after using it
You can also use json.load() which takes a FILE as argument
import json
file = open("file.json", "r")
json_data = json.load(file)
file.close()

your script needs to open the file to get a file handle, than we can read the json.
this sample contains code that can read the json file. to simulate this, it uses a string that is identical with the data coming from the file.
import json
#this is to read from the real json file
#file_name = 'email.json'
#with open(file_name, 'r') as f_obj:
#json_data = json.load(f_obj)
# this is a string that equals the result from reading json file
json_data = '{"Clear":"Pass","Email":"noname#email.com","ID":1234}'
json_data = json.loads(json_data)
print (json_data["Email"])
result: noname#email.com

import json
with open("file.json", 'r') as f:
file_content = f.read()
#convert json to python dict
tmp = json.loads(file_content)
email = tmp["Email"]
As already pointed out in previous comments, json.loads() take contents of a file rather than a file.

Pandas to JSON file formatting issue, adding \ to strings

I am using the pandas.DataFrame.to_json to convert a data frame to JSON data.
data = df.to_json(orient="records")
print(data)
This works fine and the output when printing is as expected in the console.
[{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040655,"u":"V","v":237.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:power","bt":1610040836,"u":"W","v":512.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040840,"u":"V","v":238.4}]
The problem comes when uploading it to an external API which converts it to a file format or writing it to a file locally. The output has added \ to the beginning and ends of strings.
def dataToFile(processedData):
with open('data.json', 'w') as outfile:
json.dump(processedData,outfile)
The result is shown in the clip below
[{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024746,\"u\":\"W\",\"v\":40.3},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:voltage\",\"bt\":1610024751,\"u\":\"V\",\"v\":238.5},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024764,\"u\":\"W\",\"v\":39.7}]
Is there any formatting specifically I should be including/excluding when converting the data to a file format?

Your data variable is a string of json data and not an actual dictionary. You can do a few things:
Use DataFrame.to_json() to write the file, the first argument of to_json() is the file path:
df.to_json('./data.json', orient='records')
Write the json string directly as text:
def write_text(text: str, path: str):
with open(path, 'w') as file:
file.write(text)
data = df.to_json(orient="records")
write_text(data, './data.json')
If you want to play around with the dictionary data:
def write_json(data, path, indent=4):
with open(path, 'w') as file:
json.dump(data, file, indent=indent)
df_data = df.to_dict(orient='records')
# ...some operations here...
write_json(df_data, './data.json')

Reading a text file of dictionaries stored in one line

Question
I have a text file that records metadata of research papers requested with SemanticScholar API. However, when I wrote requested data, I forgot to add "\n" for each individual record. This results in something looks like
{<metadata1>}{<metadata2>}{<metadata3>}...
and this should be if I did add "\n".
{<metadata1>}
{<metadata2>}
{<metadata3>}
...
Now, I would like to read the data. As all the metadata is now stored in one line, I need to do some hacks
First I split the cluttered dicts using "{".
Then I tried to convert the string line back to dict. Note that I do consider line might not be in a proper JSON format.
import json
with open("metadata.json", "r") as f:
for line in f.readline().split("{"):
print(json.loads("{" + line.replace("\'", "\"")))
However, there is still an error message
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I am wondering what should I do to recover all the metadata I collected?
MWE
Note, in order to get metadata.json file I use, use the following code, it should work out of the box.
import json
import urllib
import requests
baseURL = "https://api.semanticscholar.org/v1/paper/"
paperIDList = ["200794f9b353c1fe3b45c6b57e8ad954944b1e69",
"b407a81019650fe8b0acf7e4f8f18451f9c803d5",
"ff118a6a74d1e522f147a9aaf0df5877fd66e377"]
for paperID in paperIDList:
response = requests.get(urllib.parse.urljoin(baseURL, paperID))
metadata = response.json()
record = dict()
record["title"] = metadata["title"]
record["abstract"] = metadata["abstract"]
record["paperId"] = metadata["paperId"]
record["year"] = metadata["year"]
record["citations"] = [item["paperId"] for item in metadata["citations"] if item["paperId"]]
record["references"] = [item["paperId"] for item in metadata["references"] if item["paperId"]]
with open("metadata.json", "a") as fileObject:
fileObject.write(json.dumps(record))

The problem is that when you do the split("{") you get a first item that is empty, corresponding to the opening {. Just ignore the first element and everything works fine (I added an r in your quote replacements so python considers then as strings literals and replace them properly):
with open("metadata.json", "r") as f:
for line in f.readline().split("{")[1:]:
print(json.loads("{" + line).replace(r"\'", r"\""))
As suggested in the comments, I would actually recommend recreating the file or saving a new version where you replace }{ by }\n{:
with open("metadata.json", "r") as f:
data = f.read()
data_lines = data.replace("}{","}\n{")
with open("metadata_mod.json", "w") as f:
f.write(data_lines)
That way you will have the metadata of a paper per line as you want.

Read JSON file correctly

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia

With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.

Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]

How to read json file which has multiple json objects seperated by new line?

I want to read a json file in which each line contains a new json object.
File looks like below -
{'P':'a1','D':'b1','T':'c1'}
{'P':'a2','D':'b2','T':'c2'}
{'P':'a3','D':'b3','T':'c3'}
{'P':'a4','D':'b4','T':'c4'}
I'm trying to read this file like below -
print pd.read_json("sample.json", lines = True)
I'm facing below exception -
ValueError: Expected object or value
Actually this sample.json file is of ~240mb. Format of this file is like this only. It's each line contains one new json object and I want to read this file using python pandas.

As others have said in the comments, it's not really JSON. You can use ast.literal_eval():
import pandas as pd
import ast
with open('sample.json') as f:
content = f.readlines()
pd.DataFrame([ast.literal_eval(line) for line in content])
Or replace the single quotes with doubles:
import pandas as pd
import json
with open('sample.json') as f:
content = f.readlines()
pd.DataFrame([json.loads(line.replace("'", '"')) for line in content])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I loop through a multiple delimited JSON file over Python? - python

Related

I need help creating a simple python script that stores an attribute value from a custom json file

Pandas to JSON file formatting issue, adding \ to strings

Reading a text file of dictionaries stored in one line

Read JSON file correctly

How to read json file which has multiple json objects seperated by new line?

Categories

Resources