I have a nested dictionary with many items in a json file:
{
"Create Code For Animals": {
"mammals": {
"dog": {
"color": "brown",
"name": "John",
"legs": "four",
"tail": "yes"
},
"cat": {
"color": "blue",
"name": "Johnny",
"legs": "four",
"tail": "yes"
},
"donkey": {
"color": "grey",
"name": "Mickey",
"legs": "four",
"tail": "yes"
}
I want to replace the name in each one of the animals, then save it back to the file, AND keep the indent as it was (as shown).
I'm using the following 2 methods for loading and dumping the original and updated dictionary.
All is working well (for changing the value and saving it back to the file) except the indent (format) of the lines is ruined after saving the file and the file is saved as one long line (with '\n' shown after the updated value).
I've tried using 'pickle' (as seen in one of the posts here), but this didn't work, made a mess of all the data in the file.
def loadJson(self, jsonFilename):
with open(FILE_PATH + '\\' + jsonFilename, 'r') as f:
return json.load(f)
def writeJson(self, jsonFilename, jsonDict):
with open(FILE_PATH + '\\' + jsonFilename, 'w') as f:
return json.dump(jsonDict, f)
Any help will do.
json.dumps and dump have a parameter called indent
If ``indent`` is a non-negative integer, then JSON array elements and
object members will be pretty-printed with that indent level. An indent
level of 0 will only insert newlines. ``None`` is the most compact
representation. Since the default item separator is ``', '``, the
output might include trailing whitespace when ``indent`` is specified.
You can use ``separators=(',', ': ')`` to avoid this
Something like this would do:
json.dump(jsonDict,f,indent=4)
Related
This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed last month.
I have a json file which looks like
{
"0": {
"id": 1700388,
"title": "Disconnect: The Wedding Planner",
"year": 2023,
"imdb_id": "tt24640474",
"tmdb_id": 1063242,
"tmdb_type": "movie",
"type": "movie"
},
"1": {
"id": 1631017,
"title": "The Pale Blue Eye",
"year": 2022,
"imdb_id": "tt14138650",
"tmdb_id": 800815,
"tmdb_type": "movie",
"type": "movie"
},
"2": {
"id": 1597915,
"title": "The Man from Toronto",
"year": 2022,
"imdb_id": "tt11671006",
"tmdb_id": 667739,
"tmdb_type": "movie",
"type": "movie"
},
I am trying to read this and extract the "tmbd_id" to store as a variable. I then intend to inject this into a url for an api get request.
The next step is to add the response parameters to the json file. Then adding it all into a loop. There are 1000 entries.
I have been trying to use other answers but I suspect the nature of my json is causing the issue with the integer name causing issues. It has been giving me this error
for i in data['0']:
TypeError: list indices must be integers or slices, not str
You can use json.loads to read the file's content and then iterate to find those id's:
import json
tmdb_ids = []
with open('test.json', 'r') as json_fp:
imdb_info = json.load(json_fp)[0]
for dict_index, inner_dict in imdb_info.items():
tmdb_ids.append(inner_dict['tmdb_id'])
print(tmdb_ids)
Result:
[1063242, 800815, 667739]
You can make it more sophisticated if you need a specific tmdb_id from a specific "movie" by adding some if statements to the for loop, choosing movies you want.
Edit
I've noticed this is an array of dictionaries, thus we can iterate over each "movie chunk" and extract the tmdb_id from each one:
with open('test.json', 'r') as json_fp:
imdb_info = json.load(json_fp)
tmdb_ids = [movie_info['tmdb_id'] for movies_chunk in imdb_info for movie_index, movie_info in movies_chunk.items()]
for reading json:
import json
with open("file.json","r") as file:
text = json.load(file)
for tmbd_id:
for i in text:
print(text[i]["tmdb_id"])
I have .json documents generated from the same code. Here multiple nested dicts are being dumped to the json documents. While loadling with json.load(opened_json), I get the json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590) like error for some of of the files whereas not for others. It is not understood why. What is the proper way to dump multiple dicts (maybe nested) into json docs and in my current case what is way to read them all? (Extra: Dicts can be over multiple lines, so 'linesplitting' does not work probably.)
Ex: Say I am json.dump(data, file) with data = {'meta_data':{some_data}, 'real_data':{more_data}}.
Let us take these two fake files:
{
"meta_data": {
"id": 0,
"start": 1238397024.0,
"end": 1238397056.0,
"best": []
},
"real_data": {
"YAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and
{
"meta_data": {
"id": 0,
"start": 1238397056.0,
"end": 1238397088.0,
"best": []
},
"real_data": {
"XAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and try to load them using json.load(open(file_path)) for duplicatling the problem.
You chose not to offer a
reprex.
Here is the code I'm running
which is intended to represent what you're running.
If there is some discrepancy, update the original
question to clarify the details.
import json
from io import StringIO
some_data = dict(a=1)
more_data = dict(b=2)
data = {"meta_data": some_data, "real_data": more_data}
file = StringIO()
json.dump(data, file)
file.seek(0)
d = json.load(file)
print(json.dumps(d, indent=4))
output
{
"meta_data": {
"a": 1
},
"real_data": {
"b": 2
}
}
As is apparent, over the circumstances you have
described the JSON library does exactly what we
would expect of it.
EDIT
Your screenshot makes it pretty clear
that a bunch of ASCII NUL characters are appended
to the 1st file.
We can easily reproduce that JSONDecodeError: Extra data
symptom by adding a single line:
json.dump(data, file)
file.write(chr(0))
(Or perhaps chr(0) * 80 more closely matches the truncated screenshot.)
If your file ends with extraneous characters, such as NUL,
then it will no longer be valid JSON and compliant
parsers will report a diagnostic message when they
attempt to read it.
And there's nothing special about NUL, as a simple
file.write("X") suffices to produce that same
diagnostic.
You will need to trim those NULs from the file's end
before attempting to parse it.
For best results, use UTF8 unicode encoding with no
BOM.
Your editor should have settings for
switching to utf8.
Use $ file foo.json to verify encoding details,
and $ iconv --to-code=UTF-8 < foo.json
to alter an unfortunate encoding.
You need to read the file, you can do both of these.
data = json.loads(open("data.json").read())
or
with open("data.json", "r") as file:
data = json.load(file)
What would be the python code to alphabetize the whitelist values in the json file, without altering the formatting? Some combination of json.loads(), sorted(), and json.dumps()?
If I use sorted() on the whitelist, I would have a list without \n for each item, to maintain the formatting in the text file.
{
"query": "python",
"desired_count": 10,
"batch_limit": 10,
"optional": {
"tld": "",
"lang": "",
"safe": "",
"country": ""
},
"whitelist": [
"google-analytics.com",
"w3.org",
"jquery.com",
"jsdelivr.net",
"polyfill.io",
"recaptcha.net",
"youtube-nocookie.com",
"youtube.com",
"ytimg.com",
"vimeo.com",
"vimeocdn.com",
"hearstapps.com",
"highcharts.com",
"paypal.com",
"paypalobjects.com",
"creativecommons.org",
"licensebuttons.net"
]
}
You can try the below. Using indent with json.dumps will keep to your formatting :
import json
with open('C:\\Users\\dell\\Desktop\\test.json', 'r') as read_file:
jsondict = json.load(read_file)
# Sorting the whitelist item
jsondict["whitelist"].sort()
with open("C:\\Users\\dell\\Desktop\\test1.json", "w") as write_file:
json.dump(jsondict, write_file, indent=2) # encode dict into JSON
I have a txt file with json structures. the problem is the file does not only contain json structures but also raw text like log error:
2019-01-18 21:00:05.4521|INFO|Technical|Batch Started|
2019-01-18 21:00:08.8740|INFO|Technical|Got Entities List from 20160101 00:00 :
{
"name": "1111",
"results": [{
"filename": "xxxx",
"numberID": "7412"
}, {
"filename": "xgjhh",
"numberID": "E52"
}]
}
2019-01-18 21:00:05.4521|INFO|Technical|Batch Started|
2019-01-18 21:00:08.8740|INFO|Technical|Got Entities List from 20160101 00:00 :
{
"name": "jfkjgjkf",
"results": [{
"filename": "hhhhh",
"numberID": "478962"
}, {
"filename": "jkhgfc",
"number": "12544"
}]
}
I read the .txt file but trying to patch the jason structures I have an error:
IN :
import json
with open("data.txt", "r", encoding="utf-8", errors='ignore') as f:
json_data = json.load(f)
OUT : json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)
I would like to parce json and save as csv file.
A more general solution to parsing a file with JSON objects mixed with other content without any assumption of the non-JSON content would be to split the file content into fragments by the curly brackets, start with the first fragment that is an opening curly bracket, and then join the rest of fragments one by one until the joined string is parsable as JSON:
import re
fragments = iter(re.split('([{}])', f.read()))
while True:
try:
while True:
candidate = next(fragments)
if candidate == '{':
break
while True:
candidate += next(fragments)
try:
print(json.loads(candidate))
break
except json.decoder.JSONDecodeError:
pass
except StopIteration:
break
This outputs:
{'name': '1111', 'results': [{'filename': 'xxxx', 'numberID': '7412'}, {'filename': 'xgjhh', 'numberID': 'E52'}]}
{'name': 'jfkjgjkf', 'results': [{'filename': 'hhhhh', 'numberID': '478962'}, {'filename': 'jkhgfc', 'number': '12544'}]}
This solution will strip out the non-JSON structures, and wrap them in a containing JSON structure.This should do the job for you. I'm posting this as is for expediency, then I'll edit my answer for a more clear explanation. I'll edit this first bit when I've done that:
import json
with open("data.txt", "r", encoding="utf-8", errors='ignore') as f:
cleaned = ''.join([item.strip() if item.strip() is not '' else '-split_here-' for item in f.readlines() if '|INFO|' not in item]).split('-split_here-')
json_data = json.loads(json.dumps(('{"entries":[' + ''.join([entry + ', ' for entry in cleaned])[:-2] + ']}')))
Output:
{"entries":[{"name": "1111","results": [{"filename": "xxxx","numberID": "7412"}, {"filename": "xgjhh","numberID": "E52"}]}, {"name": "jfkjgjkf","results": [{"filename": "hhhhh","numberID": "478962"}, {"filename": "jkhgfc","number": "12544"}]}]}
What's going on here?
In the cleaned = ... line, we're using a list comprehension that creates a list of the lines in the file (f.readlines()) that do not contain the string |INFO| and adds the string -split_here- to the list whenever there's a blank line (where .strip() yields '').
Then, we're converting that list of lines (''.join()) into a string.
Finally we're converting that string (.split('-split_here-') into a list of lists, separating the JSON structures into their own lists, marked by blank lines in data.txt.
In the json_data = ... line, we're appending a ', ' to each of the JSON structures using a list comprehension.
Then, we convert that list back into a single string, stripping off the last ', ' (.join()[:-2]. [:-2]slices of the last two characters from the string.).
We then wrap the string with '{"entries":[' and ']}' to make the whole thing a valid JSON structure, and feed it to json.dumps and json.loads to clean any encoding and load your data a a python object.
You could do one of several things:
On the Command Line, remove all lines where, say, "|INFO|Technical|" appears (assuming this appears in every line of raw text):
sed -i '' -e '/\|INFO\|Technical/d' yourfilename (if on Mac),
sed -i '/\|INFO\|Technical/d' yourfilename (if on Linux).
Move these raw lines into their own JSON fields
Use the "text structures" as a delimiter between JSON objects.
Iterate over the lines in the file, saving them to a buffer until you encounter a line that is a text line, at which point parse the lines you've saved as a JSON object.
import re
import json
def is_text(line):
# returns True if line starts with a date and time in "YYYY-MM-DD HH:MM:SS" format
line = line.lstrip('|') # you said some lines start with a leading |, remove it
return re.match("^(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})", line)
json_objects = []
with open("data.txt") as f:
json_lines = []
for line in f:
if not is_text(line):
json_lines.append(line)
else:
# if there's multiple text lines in a row json_lines will be empty
if json_lines:
json_objects.append(json.loads("".join(json_lines)))
json_lines = []
# we still need to parse the remaining object in json_lines
# if the file doesn't end in a text line
if json_lines:
json_objects.append(json.loads("".join(json_lines)))
print(json_objects)
Repeating logic in the last two lines is a bit ugly, but you need to handle the case where the last line in your file is not a text line, so when you're done with the for loop you need parse the last object sitting in json_lines if there is one.
I'm assuming there's never more than one JSON object between text lines and also my regex expression for a date will break in 8,000 years.
You could count curly brackets in your file to find beginning and ending of your jsons, and store them in list, here found_jsons.
import json
open_chars = 0
saved_content = []
found_jsons = []
for i in content.splitlines():
open_chars += i.count('{')
if open_chars:
saved_content.append(i)
open_chars -= i.count('}')
if open_chars == 0 and saved_content:
found_jsons.append(json.loads('\n'.join(saved_content)))
saved_content = []
for i in found_jsons:
print(json.dumps(i, indent=4))
Output
{
"results": [
{
"numberID": "7412",
"filename": "xxxx"
},
{
"numberID": "E52",
"filename": "xgjhh"
}
],
"name": "1111"
}
{
"results": [
{
"numberID": "478962",
"filename": "hhhhh"
},
{
"number": "12544",
"filename": "jkhgfc"
}
],
"name": "jfkjgjkf"
}
I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None