How to remove comment lines from a JSON file in python

How to remove comment lines from a JSON file in python - python

I am getting a JSON file with following format :
// 20170407
// http://info.employeeportal.org
{
"EmployeeDataList": [
{
"EmployeeCode": "200005ABH9",
"Skill": CT70,
"Sales": 0.0,
"LostSales": 1010.4
}
]
}
Need to remove the extra comment lines present in the file.
I tried with the following code :
import json
import commentjson
with open('EmployeeDataList.json') as json_data:
employee_data = json.load(json_data)
'''employee_data = json.dump(json.load(json_data))'''
'''employee_data = commentjson.load(json_data)'''
print(employee_data)`
Still not able to remove the comments from the file and bring
the JSON file in correct format.
Not getting where things are going wrong? Any direction in this regard is highly appreciated.Thanks in advance

You're not using commentjson correctly. It has the same interface as the json module:
import commentjson
with open('EmployeeDataList.json', 'r') as handle:
employee_data = commentjson.load(handle)
print(employee_data)
Although in this case, your comments are simple enough that you probably don't need to install an extra module to remove them:
import json
with open('EmployeeDataList.json', 'r') as handle:
fixed_json = ''.join(line for line in handle if not line.startswith('//'))
employee_data = json.loads(fixed_json)
print(employee_data)
Note the difference here between the two code snippets is that json.loads is used instead of json.load, since you're parsing a string instead of a file object.

Try JSON-minify:
JSON-minify minifies blocks of JSON-like content into valid JSON by removing all whitespace and JS-style comments (single-line // and multiline /* .. */).

I usually read the JSON as a normal file, delete the comments and then parse it as a JSON string. It can be done in one line with the following snippet:
with open(path,'r') as f: jsonDict = json.loads('\n'.join(row for row in f if not row.lstrip().startswith("//")))
IMHO it is very convenient because it does not need CommentJSON or any other non standard library.

Well that's not a valid json format so just open it like you would a text document then delete anything from// to \n.
with open("EmployeeDataList.json", "r") as rf:
with open("output.json", "w") as wf:
for line in rf.readlines():
if line[0:2] == "//"
continue
wf.write(line)

Your file is parsable using HOCON.
pip install pyhocon
>>> from pyhocon import ConfigFactory
>>> conf = ConfigFactory.parse_file('data.txt')
>>> conf
ConfigTree([('EmployeeDataList',
[ConfigTree([('EmployeeCode', '200005ABH9'),
('Skill', 'CT70'),
('Sales', 0.0),
('LostSales', 1010.4)])])])

If it is the same number of lines every time you can just do:
fh = open('EmployeeDataList.NOTjson',"r")
rawText = fh.read()
json_data = rawText[rawText.index("\n",3)+1:]
This way json_data is now the string of text without the first 3 lines.

Related

Convert json with data from each id in different lines into one line per id with python

I have a json file with the following format:
{
"responses":[
{
"id":"123",
"cid":"01A",
"response":{nested lists and dictionaries}
},
{
"id":"456",
"cid":"54G",
"response":{nested lists and dictionaries}
}
]}
And so on.
And I want to convert it into a json file like this:
{"id":"123", "cid":"01A", "response":{nested lists and dictionaries}},
{"id":"456", "cid":"54G", "response":{nested lists and dictionaries}}
or
{responses:[
{"id":"123", "cid":"01A", "response":{nested lists and dictionaries}},
{"id":"456", "cid":"54G", "response":{nested lists and dictionaries}}
]}
I don't care about the surrounding format as long as I have the information for each ID in just one line.
I have to do this while reading it because things like pd.read_json don't read this kind of file.
Thanks!

Maybe just dump it line wise? But I guess I didn't understand your question right?
import json
input_lines = {"responses": ...}
with open("output.json", "w") as f:
for line in input_lines["responses"]:
f.write(json.dumps(line) + "\n")

You can use the built-in json library to print each response on a separate line. The json.dump() function has an option to indent, if you want that, but its default is to put everything on one line, like what you want.
Here's an example that works for the input you showed in your post.
#!/usr/bin/env python3
import json
import sys
with open(sys.argv[1]) as json_file:
obj = json.load(json_file)
print("{responses:[")
for response in obj['responses']:
print(json.dumps(response))
print("]}")
Usage (assuming you named the program format_json.py):
$ chmod +x format_json.py
$ format_json.py my_json_input.json > my_json_output.json
Or, if you're not in a command-line environment, you can also hardcode the input and output filenames:
#!/usr/bin/env python3
import json
import sys
infile = 'my_json_input.json'
outfile = 'my_json_output.json'
with open(infile) as json_file:
obj = json.load(json_file)
print("{responses:[", file=outfile)
for response in obj['responses']:
print(json.dumps(response), file=outfile)
print("]}", file=outfile)

YAML find and replace text Python

I have some files in YAML format, I need to find the text in the $title file and replace with what I specified. What the configuration file looks like approximately:
JoinGame-MOTD:
Enabled: true
Messages:
- '$title'
The YAML file may look different, so I want to make a universal code that will not get any specific string, but replace all $title with what I specified
What I was trying to do:
import sys
import yaml
with open(r'config.yml', 'w') as file:
def tr(s):
return s.replace('$title', 'Test')
yaml.dump(file, sys.stdout, transform=tr)
Please help me. It is not necessary to work with my code, I will be happy with any examples that can suit me

Might be easier to not use the yaml package at all.
with open("file.yml", "r") as fin:
with open("file_replaced.yml", "w") as fout:
for line in fin:
fout.write(line.replace('$title', 'Test'))
EDIT:
To update in place
with open("config.yml", "r+") as f:
contents = f.read()
f.seek(0)
f.write(contents.replace('$title', 'Test'))
f.truncate()

You can also read & write data in one go. os.path.join is optional, it makes sure the yaml file is read relative to path your script is stored
import re
import os
with open(os.path.join(os.path.dirname(__file__), 'temp.yaml'), 'r+') as f:
data = f.read()
f.seek(0)
new_data = data.replace('$title', 'replaced!')
f.write(new_data)
f.truncate()
In case you wish to dynamically replace other keywords besides $title, like $description or $name, you can write a function using regex like this;
def replaceString(text_to_search, keyword, replacement):
return re.sub(f"(\${keyword})[\W]", replacement, text_to_search)
replaceString('My name is $name', '$name', 'Bob')

Reading a text file of dictionaries stored in one line

Question
I have a text file that records metadata of research papers requested with SemanticScholar API. However, when I wrote requested data, I forgot to add "\n" for each individual record. This results in something looks like
{<metadata1>}{<metadata2>}{<metadata3>}...
and this should be if I did add "\n".
{<metadata1>}
{<metadata2>}
{<metadata3>}
...
Now, I would like to read the data. As all the metadata is now stored in one line, I need to do some hacks
First I split the cluttered dicts using "{".
Then I tried to convert the string line back to dict. Note that I do consider line might not be in a proper JSON format.
import json
with open("metadata.json", "r") as f:
for line in f.readline().split("{"):
print(json.loads("{" + line.replace("\'", "\"")))
However, there is still an error message
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I am wondering what should I do to recover all the metadata I collected?
MWE
Note, in order to get metadata.json file I use, use the following code, it should work out of the box.
import json
import urllib
import requests
baseURL = "https://api.semanticscholar.org/v1/paper/"
paperIDList = ["200794f9b353c1fe3b45c6b57e8ad954944b1e69",
"b407a81019650fe8b0acf7e4f8f18451f9c803d5",
"ff118a6a74d1e522f147a9aaf0df5877fd66e377"]
for paperID in paperIDList:
response = requests.get(urllib.parse.urljoin(baseURL, paperID))
metadata = response.json()
record = dict()
record["title"] = metadata["title"]
record["abstract"] = metadata["abstract"]
record["paperId"] = metadata["paperId"]
record["year"] = metadata["year"]
record["citations"] = [item["paperId"] for item in metadata["citations"] if item["paperId"]]
record["references"] = [item["paperId"] for item in metadata["references"] if item["paperId"]]
with open("metadata.json", "a") as fileObject:
fileObject.write(json.dumps(record))

The problem is that when you do the split("{") you get a first item that is empty, corresponding to the opening {. Just ignore the first element and everything works fine (I added an r in your quote replacements so python considers then as strings literals and replace them properly):
with open("metadata.json", "r") as f:
for line in f.readline().split("{")[1:]:
print(json.loads("{" + line).replace(r"\'", r"\""))
As suggested in the comments, I would actually recommend recreating the file or saving a new version where you replace }{ by }\n{:
with open("metadata.json", "r") as f:
data = f.read()
data_lines = data.replace("}{","}\n{")
with open("metadata_mod.json", "w") as f:
f.write(data_lines)
That way you will have the metadata of a paper per line as you want.

Read JSON file correctly

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia

With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.

Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]

Python : store data in file

I am trying to store the jsonas text file , I am able to print the file but am not able to store the file and also the o/p is coming wiht unicode charatcer.
PFB code.
import json
from pprint import pprint
with open('20150827_abc_json') as data_file:
f=open("file.txt","wb")
f.write(data=json.load(data_file))
print (data)>f
f.close()
When i execute it , the file gets created but its of zero byte and also how can i get rid of unicode character and also store the output.
o/p
u'Louisiana', u'city': u'New Olreans'

To serialize JSON to file you should use json.dump function. Try to use following code
import json
from pprint import pprint
with open('20150827_abc_json') as data_file, open('file.txt','w') as f:
data=json.load(data_file)
print data
json.dump(data,f)

the print syntax is wrong, you put only a single > while there should be two of them >>.
in python 3 (or python2 if you from __future__ import print_function) you can also write, in a more explicit way:
print("blah blah", file=yourfile)
I would also suggest to use a context manager for both files:
with open('20150827_abc_json') as data_file:
with open("file.txt","wb") as file:
...
otherwise you risk that an error will leave you destination file pending.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to remove comment lines from a JSON file in python - python

Try JSON-minify: JSON-minify minifies blocks of JSON-like content into valid JSON by removing all whitespace and JS-style comments (single-line // and multiline /* .. */).

Well that's not a valid json format so just open it like you would a text document then delete anything from// to \n. with open("EmployeeDataList.json", "r") as rf: with open("output.json", "w") as wf: for line in rf.readlines(): if line[0:2] == "//" continue wf.write(line)

Your file is parsable using HOCON. pip install pyhocon >>> from pyhocon import ConfigFactory >>> conf = ConfigFactory.parse_file('data.txt') >>> conf ConfigTree([('EmployeeDataList', [ConfigTree([('EmployeeCode', '200005ABH9'), ('Skill', 'CT70'), ('Sales', 0.0), ('LostSales', 1010.4)])])])

If it is the same number of lines every time you can just do: fh = open('EmployeeDataList.NOTjson',"r") rawText = fh.read() json_data = rawText[rawText.index("\n",3)+1:] This way json_data is now the string of text without the first 3 lines.

Related

Convert json with data from each id in different lines into one line per id with python

YAML find and replace text Python

Reading a text file of dictionaries stored in one line

Read JSON file correctly

Python : store data in file

Categories

Resources