How to save multiple JSON strings in a single file? - python

I have the following multiple JSON strings in a file:
{
"col1": "e1",
"col2": "e5"
}
{
"col1": "a1",
"col2": "a4",
"col3": "e8"
}
This is how I read this file:
import json
data = []
for line in open('test.json', 'r', encoding='cp850'):
data.append(json.loads(line))
Is there any way to say this file back in the following format (i.e. all JSON strings would be wrapped inside [..] and there should be commas separating them):
[
{
"col1": "e1",
"col2": "e5"
},
{
"col1": "a1",
"col2": "a4",
"col3": "e8"
}
]
For those who need some examples as a proof of whatever, I tried this:
with open('output.json', 'w') as f:
json.dump(data, f)
It does not write the content of data into JSON file.
And a dirty solution that neither works:
data = ""
with open('test.json', 'r', encoding='cp850') as myfile:
data = str(json.loads(line)) + "," + data
data = data[:len(data)-1] + "]" + data[len(data):]
data = "[" + data
with open('output.json', 'w') as outfile:
json.dump(data, outfile)

It's not a good idea to have multiple JSONs per file without some kind of delimiter.
I recommend newline delimited (meaning, each JSON object is confined to 1 line in the file):
{"name": "json1"}
{"name": "json2"}
{"name": "json3"}
Then you can simply iterate the file and do something like this:
objs = []
with open("jsons.txt") as f:
for line in f:
objs.append(json.loads(line))
with open("jsons.json", "w") as f:
json.dump(obj, f);

Try this:
data = \
"""{
"col1": "e1",
"col2": "e5"
}
{
"col1": "a1",
"col2": "a4",
"col3": "e8"
}
"""
new_json = []
bracket_lvl = 0
curr_json = ''
for c in data:
curr_json += c
if c in (' ', '\t', '\n'):
continue
if c == '{':
bracket_lvl += 1
if c == '}':
bracket_lvl -= 1
if bracket_lvl == 0:
new_json.append(curr_json)
curr_json = ''
output = "[\n" + ','.join(new_json) + "\n]"
print(output)
Output:
[
{
"col1": "e1",
"col2": "e5"
},
{
"col1": "a1",
"col2": "a4",
"col3": "e8"
}
]
Edit:
Please note, this only works if the json strings are surrounded by curly braces - if you need code to handle square braces, I'll need to slightly edit this...
Edit 2:
If you want to save it to a file, use this at the bottom of the program:
with open('output.json', 'w') as f:
f.write(output)

Related

Writing to text file while reading one. (ValueError: I/O operation on closed file.)

I'm currently working on a script that rearranges JSON data in a more basic manner so I can run it through another YOLO box plot script. So far I've managed to make the script print the data in the exact format that I wish for. However, I would like to save it in a text file so I don't have to copy/paste it every time. Doing this seemed to be more difficult than first anticipated.
So here is the code that currently "works":
import sys
data = open(sys.argv[1], 'r')
with data as file:
for line in file:
split = line.split()
if split[0] == '"x":':
print("0", split[1][0:8], end = ' ')
if split[0] == '"y":':
print(split[1][0:8], end = ' ')
if split[0] == '"w":':
print(split[1][0:8], end = ' ')
if split[0] == '"h":':
print(split[1][0:8])
And here is an example of the dataset that will be run through this script:
{
"car": {
"count": 7,
"instances": [
{
"bbox": {
"x": 0.03839285671710968,
"y": 0.8041666746139526,
"w": 0.07678571343421936,
"h": 0.16388888657093048
},
"confidence": 0.41205787658691406
},
{
"bbox": {
"x": 0.9330357313156128,
"y": 0.8805555701255798,
"w": 0.1339285671710968,
"h": 0.2222222238779068
},
"confidence": 0.8200334906578064
},
{
"bbox": {
"x": 0.15803571045398712,
"y": 0.8111110925674438,
"w": 0.22678571939468384,
"h": 0.21111111342906952
},
"confidence": 0.8632314801216125
},
{
"bbox": {
"x": 0.762499988079071,
"y": 0.8916666507720947,
"w": 0.1428571492433548,
"h": 0.20555555820465088
},
"confidence": 0.8819259405136108
},
{
"bbox": {
"x": 0.4178571403026581,
"y": 0.8902778029441833,
"w": 0.17499999701976776,
"h": 0.17499999701976776
},
"confidence": 0.8824222087860107
},
{
"bbox": {
"x": 0.5919643044471741,
"y": 0.8722222447395325,
"w": 0.16607142984867096,
"h": 0.25
},
"confidence": 0.8865317106246948
},
{
"bbox": {
"x": 0.27767857909202576,
"y": 0.8541666865348816,
"w": 0.2053571492433548,
"h": 0.1805555522441864
},
"confidence": 0.8922017216682434
}
]
}
}
The outcome will be looking like this:
0 0.038392 0.804166 0.076785 0.163888
0 0.933035 0.880555 0.133928 0.222222
0 0.158035 0.811111 0.226785 0.211111
0 0.762499 0.891666 0.142857 0.205555
0 0.417857 0.890277 0.174999 0.174999
0 0.591964 0.872222 0.166071 0.25
0 0.277678 0.854166 0.205357 0.180555
Instead of printing these lines I've tried writing them to a new text file, however, I keep getting the "ValueError: I/O operation on closed file." error. I would guess this is because I already have one open and opening a new one will close the first one? Is there an easy way to work around this? Or is the hassle too much to bother and copy/pasting the print result is the "easiest" way?
Why don't you use the json and csv packages??
import csv
import json
# import sys
# file = sys.argv[1]
file = "input.json"
output_file = "output.csv"
with open(file, "r") as data_file:
data = json.load(data_file)
with open(output_file, "w") as csv_file:
writer = csv.writer(csv_file, delimiter=' ')
for value in data.values():
instances = value.get("instances")
bboxes = [instance.get("bbox") for instance in instances]
for bbox in bboxes:
writer.writerow([
0,
f"{bbox['x']:.6f}",
f"{bbox['y']:.6f}",
f"{bbox['w']:.6f}",
f"{bbox['h']:.6f}",
])
Output:
0 0.038393 0.804167 0.076786 0.163889
0 0.933036 0.880556 0.133929 0.222222
0 0.158036 0.811111 0.226786 0.211111
0 0.762500 0.891667 0.142857 0.205556
0 0.417857 0.890278 0.175000 0.175000
0 0.591964 0.872222 0.166071 0.250000
0 0.277679 0.854167 0.205357 0.180556
Notes:
It's important that you understand your input file format you are working with. Read about JSON here.
I do round the values to 6 digits in both examples (not sure what the requirements are but simply modify f"{bbox['x']:.6f}" and the 3 lines following that one to your use case)
Or, if you want to use jmespath along with csv and json:
import csv
import json
import jmespath # pip install jmespath
# import sys
# file = sys.argv[1]
file = "input.json"
output_file = "output.csv"
with open(file, "r") as data_file:
data = json.load(data_file)
bboxes = jmespath.search("*.instances[*].bbox", data)
with open(output_file, "w") as csv_file:
writer = csv.writer(csv_file, delimiter=' ')
for bbox in bboxes[0]:
writer.writerow([
0,
f"{bbox['x']:.6f}",
f"{bbox['y']:.6f}",
f"{bbox['w']:.6f}",
f"{bbox['h']:.6f}",
])
I suggest parsing the file as JSON rather than raw text. If the file is JSON, treat it as JSON in order to avoid the unfortunate case in which it is valid, minified JSON and the lack of line breaks makes treating it as a string a nightmare of regexes that are likely fragile. Or possibly worse, the file is invalid JSON.
import json
import sys
with open(sys.argv[1], 'r') as f:
raw = f.read()
obj = json.loads(raw)
print("\n".join(
f"0 {i['bbox']['x']:.6f} {i['bbox']['y']:.6f} {i['bbox']['w']:.6f} {i['bbox']['h']:.6f}"
for i in obj["car"]["instances"])
)

Python unable to parse Json file with error "raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data"

I am trying to download a Json file from an API and convert it into a csv file, but the script throws the below error while parsing the json file.
For every 100 records json file closes the "]" and starts another "[". This format is not being accepted as json format. could you please suggest me how i can parse the "]" and "[" which appears every 100 records in an efficient way.The code works fine for less than 100 records without the [] brackets.
Error message:
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data
Json file format:
**[**
{
"A": "5",
"B": "811",
"C": [
{ "C1": 1,
"C2": "sa",
"C3": 3
}
],
"D": "HH",
"E": 0,
"F": 6
},
{
"A": "5",
"B": "811",
"C": [
{ "C1": 1,
"C2": "fa",
"C3": 3
}
],
"D": "HH",
"E": 0,
"F": 6
}
**]**
**[**
{
"A": "5",
"B": "811",
"C": [
{ "C1": 1,
"C2": "da",
"C3": 3
}
],
"D": "HH",
"E": 0,
"F": 6
}
**]**
Code:
import json
import pandas as pd
from flatten_json import flatten
def json2excel():
file_path = r"<local file path>"
json_list = json.load(open(file_path + '.json', 'r', encoding='utf-8', errors='ignore'))
key_list = ['A', 'B']
json_list = [{k: d[k] for k in key_list} for d in json_list]
# Flatten and convert to a data frame
json_list_flattened = (flatten(d, '.') for d in json_list)
df = pd.DataFrame(json_list_flattened)
# Export to CSV in the same directory with the original file name
export_csv = df.to_csv(file_path + r'.csv', sep=',', encoding='utf-8', index=None, header=True)
def main():
json2excel()
I would recommend parsing your data that you receive from your API first. This pre-processed data can be fed to a JSON parser later.
I came up with a simple python code that's just a small tweak to the solution of parenthesis matching problem. Here's my working code that you might use for pre-processing your data.
def build_json_items(custom_json):
open_tup = tuple('({[')
close_tup = tuple(')}]')
map = dict(zip(open_tup, close_tup))
queue = []
json_items = []
temp = ""
for i in custom_json:
if i in open_tup:
queue.append(map[i])
elif i in close_tup:
if not queue or i != queue.pop():
return "Unbalanced"
if len(queue) == 0:
# We have reached to a point where everything so far is balanced.
# This is the point where we can separate out the expression
temp = temp + str(i)
json_items.append(temp)
temp = "" # Re-initialize
else:
temp = temp + str(i)
if not queue:
# Provided string is balanced
return True, json_items
else:
return False, json_items
This build_json_items function will take your custom JSON payload and will parse individual valid JSON items based on the information that you provided in your question. Here's an example of how you can trigger this function. You can use the following.
input_data = "[{\"A\":\"5\",\"B\":\"811\",\"C\":[{\"C1\":1,\"C2\":\"sa\",\"C3\":3}],\"D\":\"HH\",\"E\":0,\"F\":6},{\"A\":\"5\",\"B\":\"811\",\"C\":[{\"C1\":1,\"C2\":\"fa\",\"C3\":3}],\"D\":\"HH\",\"E\":0,\"F\":6}][{\"A\":\"5\",\"B\":\"811\",\"C\":[{\"C1\":1,\"C2\":\"da\",\"C3\":3}],\"D\":\"HH\",\"E\":0,\"F\":6}]"
is_balanced, json_items = build_json_items(input_data)
print(f"Available JSON items: {len(json_items)}")
print("JSON items are the following")
for i in json_items:
print(i)
Here's the output of the print statements.
Available JSON items: 2
JSON items are the following
[{"A":"5","B":"811","C":[{"C1":1,"C2":"sa","C3":3}],"D":"HH","E":0,"F":6},{"A":"5","B":"811","C":[{"C1":1,"C2":"fa","C3":3}],"D":"HH","E":0,"F":6}]
[{"A":"5","B":"811","C":[{"C1":1,"C2":"da","C3":3}],"D":"HH","E":0,"F":6}]
You can directly run and see the output here.
Once you have those payloads separated in valid JSON structure, you can feed these to you JSON parser.

How to append data to JSON list that doesn't contains any key for that list?

[
{
"name": "name one",
"id": 1
},
{
"name": "name two",
"id": 2
}
]
I want to append object to the list in .json file. how do i do?
You could read the existing json content update it and rewrite the updated list.
import json
with open("myfile.json", "r+") as f:
my_file = f.read() # read the current content
my_list = json.loads(my_file) # convert from json object to dictionary type
dict_obj = {
"name": "name three",
"id": 3
}
my_list.append(dict_obj)
f.seek(0) # sets point at the beginning of the file
f.truncate() # Clear previous content
print(f" going to rewrite {my_list}")
f.write(json.dumps(my_list)) # Write updated version file
I'm not entirely sure of what you are asking but perhaps the code below will help:
const myList = [
{
"name": "name one",
"id": 1
},
{
"name": "name two",
"id": 2
}
]
const myNewItem = {
"name": "name three",
"id": 3
}
const addItemIfDifferentId = (list, newItem) => [...list, !list.map(({id}) =>id).includes(newItem.id) ? {...newItem} : {} ]
const newList = addItemIfDifferentId(myList, myNewItem)
newList
Maybe this will help you:
import json
# When loading a .json files it will be a string:
with open('data.json') as json_file:
x = json.load(json_file) //{"key1":"123", "key2":"456", "key3":"789"}
# python object to be appended
y = {"key4": "101112"}
# Load the json string to be an object type:
z = json.loads(x)
# appending the data
z.update(y)
# the result is a JSON string:
print(json.dumps(z))
with open('data.json', 'w') as outfile:
json.dump(z, outfile)

Need to cut off some unnecessary information from a JSON file and preserve the JSON structure

I have a JSON file
[
{
"api_key": "123123112313121321",
"collaborators_count": 1,
"created_at": "",
"custom_event_fields_used": 0,
"discarded_app_versions": [],
"discarded_errors": [],
"errors_url": "https://api.bugsnag.com/projects/1231231231312/errors",
"events_url": "https://api.bugsnag.com/projects/1231231231213/events",
"global_grouping": [],
"html_url": "https://app.bugsnag.com/lol/kek/",
"id": "34234243224224",
"ignore_old_browsers": true,
"ignored_browser_versions": {},
"is_full_view": true,
"language": "javascript",
"location_grouping": [],
"name": "asdasdaasd",
"open_error_count": 3,
"release_stages": [
"production"
],
"resolve_on_deploy": false,
"slug": "wqeqweqwwqweq",
"type": "js",
"updated_at": "2020-04-06T15:22:10.480Z",
"url": "https://api.bugsnag.com/projects/12312312213123",
"url_whitelist": null
}
]
What I need is to remove all lines apart from "id:" and "name:" and preserve the JSON structure. Can anybody advise a Python or bash script to handle this?
With jq:
$ jq 'map({id: .id, name: .name})' input.json
[
{
"id": "34234243224224",
"name": "asdasdaasd"
}
]
Using python, you could first deserialize the JSON file(JSON array of objects) with json.load, then filter out the keys you want with a list comprehension:
from json import load
keys = ["name", "id"]
with open("test.json") as json_file:
data = load(json_file)
filtered_json = [{k: obj.get(k) for k in keys} for obj in data]
print(filtered_json)
Output:
[{'name': 'asdasdaasd', 'id': '34234243224224'}]
If we want to serialize this python list to another output file, we can use json.dump:
from json import load
from json import dump
keys = ["name", "id"]
with open("test.json") as json_file, open("output.json", mode="w") as json_output:
data = load(json_file)
filtered_json = [{k: obj.get(k) for k in keys} for obj in data]
dump(filtered_json, json_output, indent=4, sort_keys=True)
output.json
[
{
"id": "34234243224224",
"name": "asdasdaasd"
}
]
You can try this:
import json
with open('<input filename>', 'r') as f:
data = json.load(f)
new_data = []
for item in data:
new_item = {key: value for key, value in item.items() if key == "id" or key =="name"}
new_data.append(new_item)
with open('<output filename>', 'w') as f:
json.dump(new_data, f)
Covert your JSON into Pandas Dataframe
{
import pandas as pd
df=pd.read_json('your json variable')
res=df.drop(['url_whitelis','api_key'],axis=1)
pd.to_json(res) }

Convert a complex layered JSON to CSV

I am trying to parse through JSON code and write the results into a csv file. The "name" values are supposed to be the column headers and the 'value' values are what need to be stored.This is my code. the CSV file writer does not separate the strings with commas: eventIdlistingsvenueperformer and when I try to do something like: header = col['name']+',' I get: eventId","listings","venue","performer And it isn't read as a csv file so...My questions are: am I going about this right? and how could I separate the strings by commas?
"results": [
{
"columns": [
{
"name": "eventId",
"value": "XXXX",
"defaultHidden": false
},
{
"name": "listings",
"value": "8",
"defaultHidden": false
},
{
"name": "venue",
"value": "Nationwide Arena",
"defaultHidden": false
}]
this is my code:
json_decode=json.loads(data)
report_result = json_decode['results']
with open('testReport2.csv','w') as result_data:
csvwriter = csv.writer(result_data,delimiter=',')
count = 0
for res in report_result:
deeper = res['columns']
for col in deeper:
if count == 0:
header = col['name']
csvwriter.writerow([header,])
count += 1
for written in report_result:
deeper =res['columns']
for col in deeper:
csvwriter.writerow([trouble,])
result_data.close()
try below code:
json_decode=json.loads(data)
report_result = json_decode['results']
new_dict = {}
for result in report_result:
columns = result["columns"]
for value in columns:
new_dict[value['name']] = value['value']
with open('testReport2.csv','w') as result_data:
csvwriter = csv.DictWriter(result_data,delimiter=',',fieldnames=new_dict.keys())
csvwriter.writeheader()
csvwriter.writerow(new_dict)
Try this:
json_decode=json.loads(data)
report_result = json_decode['results']
with open('testReport2.csv','w') as result_data:
csvwriter = csv.writer(result_data,delimiter=',')
header = list(report_result[0]['columns'][0].keys())
csvwriter.writerow(header)
for written in report_result:
for row in written['columns']:
deeper =row.values()
csvwriter.writerow(deeper)
result_data.close()

Categories