If my json file is huge it contains to many dictionaries and lists inside the dictionary and it is enclosed with double quotes means how can i proceed that. what is the deserialize? How to use the deserialize?
Use json module.
If you are having json in one file then you can use:
with open("json_data.json", "r") as data:
print(json.load(data))
OR
with open("json_data.json", "r") as data:
print(json.loads(data.read()))
If you are having json in any var, you can use:
jsonData = '{}'
jsonVal = json.loads(jsonData)
There is a package called json in python, which you can use to serialize and deserialize a dictionary.
If you want to serialize using the following:
with open("huge_json_file.json", "r") as data
json_str = json.dumps(data)
If you want to de-serialize using the following:
with open("huge_json_file.json", "r") as data
json_dict = json.loads(data)
Related
I am getting a JSON file from a curl request and I want to read a specific value from it.
Suppose that I have a JSON file, like the following one. How can I insert the "result_count" value into a variable?
Currently, after getting the response from curl, I am writing the JSON objects into a txt file like this.
json_response = connect_to_endpoint(url, headers)
f.write(json.dumps(json_response, indent=4, sort_keys=True))
Your json_response isn't a JSON content (JSON is a formatted string), but a python dict, you can access it using the keys
res_count = json_response['meta']['result_count']
Use the json module from the python standard library.
data itself is just a python dictionary, and can be accessed as such.
import json
with open('path/to/file/filename.json') as f:
data = json.load(f)
result_count = data['meta']['result_count']
you can parse a JSON string using json.loads() method in json module.
response = connect_to_endpoint(url, headers)
json_response = json.load(response)
after that you can extract an element with specify element name in Brackets
result_count = ['meta']['result_count']
I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia
With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.
Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]
I got some data from an API with Python, and I'm trying to print it to a file. My understanding was that the indent argument lets you pretty print. Here's my code:
import urllib2, json
APIKEY_VALUE = "APIKEY"
APIKEY = "?hapikey=" + APIKEY_VALUE
HS_API_URL = "http://api.hubapi.com"
def getInfo():
xulr = "/engagements/v1/engagements/paged"
url = HS_API_URL + xulr + APIKEY + params
response = urllib2.urlopen(url).read()
with open("hubdataJS.json", "w") as outfile:
json.dump(response, outfile, sort_keys=True, indent=4, ensure_ascii=False)
getInfo()
What I expected hubdataJS.json to look like when I opened it in Sublime text is some JSON with a format like this:
{
a: some data
b: [
some list of data,
more data
]
c: some other data
}
What I got instead was all the data on one line, in quotes (I thought dumps was for outputting as a string), with lots of \s, \rs, and \ns.
Confused about what I'm doing wrong.
in your code, response is a bytestring that contains the data serialized in the json format. When you do json.dump you're serializing the string to json. You end up with a json formatted file containing a string, and in that string you have another json data, so, json inside json.
To solve that you have to decode (deserialize) the bytestring data you got from the internet, before reencoding it to json to write in the file.
response = json.load(urllib2.urlopen(url))
that will convert the serialized data from the web into a real python object.
I have this
import json
header = """{"fields": ["""
print(header)
with open('fields.json', 'w') as outfile:
json.dump(header, outfile)
and the return of print it's ok:
{"fields": [
but what is in the fields.json it's
"{\"fields\": ["
Why and how can I solve it?
Thanks
There is no issue there. The \ in the file is to escape the ". To see this read the json back and print it. So to add to your code
import json
header = """{"fields": ["""
print(header)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json', 'w') as outfile:
json.dump(header, outfile)
with open('C:\\Users\\tkaghdo\\Documents\\fields.json') as data_file:
data = json.load(data_file)
print(data)
you will see the data is printed as {"fields": [
The JSON you are trying to write is bring treated as a string
so " is converted to \" . to avoid that we need to decode json using json.loads() before writing to file
The json should be complete or else json.loads() will throw an error
import json
header = """{"name":"sk"}"""
h = json.loads(header)
print(header)
with open('fields.json', 'w') as outfile:
json.dump(h, outfile)
First of all, it's not /, it's \.
Second, it's not in your JSON, because you don't really seem to have a JSON here, you have just a string. And json.dump() converts the string into a string escaped for JSON format, replacing all " with \".
By the way, you should not try to write incomplete JSON as a string yourself, it's better just to make a dictionary {"fields":[]}, then fill it with all values you want and later save into file through json.dump().
I am getting a JSON file with following format :
// 20170407
// http://info.employeeportal.org
{
"EmployeeDataList": [
{
"EmployeeCode": "200005ABH9",
"Skill": CT70,
"Sales": 0.0,
"LostSales": 1010.4
}
]
}
Need to remove the extra comment lines present in the file.
I tried with the following code :
import json
import commentjson
with open('EmployeeDataList.json') as json_data:
employee_data = json.load(json_data)
'''employee_data = json.dump(json.load(json_data))'''
'''employee_data = commentjson.load(json_data)'''
print(employee_data)`
Still not able to remove the comments from the file and bring
the JSON file in correct format.
Not getting where things are going wrong? Any direction in this regard is highly appreciated.Thanks in advance
You're not using commentjson correctly. It has the same interface as the json module:
import commentjson
with open('EmployeeDataList.json', 'r') as handle:
employee_data = commentjson.load(handle)
print(employee_data)
Although in this case, your comments are simple enough that you probably don't need to install an extra module to remove them:
import json
with open('EmployeeDataList.json', 'r') as handle:
fixed_json = ''.join(line for line in handle if not line.startswith('//'))
employee_data = json.loads(fixed_json)
print(employee_data)
Note the difference here between the two code snippets is that json.loads is used instead of json.load, since you're parsing a string instead of a file object.
Try JSON-minify:
JSON-minify minifies blocks of JSON-like content into valid JSON by removing all whitespace and JS-style comments (single-line // and multiline /* .. */).
I usually read the JSON as a normal file, delete the comments and then parse it as a JSON string. It can be done in one line with the following snippet:
with open(path,'r') as f: jsonDict = json.loads('\n'.join(row for row in f if not row.lstrip().startswith("//")))
IMHO it is very convenient because it does not need CommentJSON or any other non standard library.
Well that's not a valid json format so just open it like you would a text document then delete anything from// to \n.
with open("EmployeeDataList.json", "r") as rf:
with open("output.json", "w") as wf:
for line in rf.readlines():
if line[0:2] == "//"
continue
wf.write(line)
Your file is parsable using HOCON.
pip install pyhocon
>>> from pyhocon import ConfigFactory
>>> conf = ConfigFactory.parse_file('data.txt')
>>> conf
ConfigTree([('EmployeeDataList',
[ConfigTree([('EmployeeCode', '200005ABH9'),
('Skill', 'CT70'),
('Sales', 0.0),
('LostSales', 1010.4)])])])
If it is the same number of lines every time you can just do:
fh = open('EmployeeDataList.NOTjson',"r")
rawText = fh.read()
json_data = rawText[rawText.index("\n",3)+1:]
This way json_data is now the string of text without the first 3 lines.