Python - Issue with double quotes while writing to a JSON file - python

I am trying to convert an xml to JSON (condensed version of the code is provided below).
The issue I am facing is with a tag, which can have multiple values (example below). I cannot directly make it as dict, since the key (NAME) can have multiple values. The output generated by the code Vs the expected output is given below.
python script:
import json
mylist = ['"Event" : "BATCHS01-wbstp01"', '"Event" : "BATCHS01-wbstrt01"']
tmpdict = {}
tmpdict['Events'] = mylist
with open('test.json','w') as fp:
json.dump(tmpdict,fp,indent=4, sort_keys=False)
Output Generated:
{
"Events": [
"\"Event\" : \"BATCHS01-wbstp01\"",
"\"Event\" : \"BATCHS01-wbstrt01\""
]
}
Expected Output:
{
"Events": [
{"Event" : "BATCHS01-wbstp01"},
{"Event" : "BATCHS01-wbstrt01"}
]
}

The issue is that your mylist is an array of strings rather than an array of map objects.
You need to remove the outer quote to make it:
mylist = [{"Event" : "BATCHS01-wbstp01"}, {"Event" : "BATCHS01-wbstrt01"}]
I don't see why you cannot produce this structure from XML. It's rather simple regardless of whether 'key (NAME) can have multiple values'.

You can salvage your data by first converting it to valid JSON piecewise and then dumping the JSON into a string or a file:
tmpdict = {"Events" : [json.loads('{' + item + '}') for item in mylist]}
json.dumps(tmpdict)
'{"Events": [{"Event": "BATCHS01-wbstp01"}, {"Event": "BATCHS01-wbstrt01"}]}'

Code:
You can first convert the XML pieces to dict's like:
tmpdict['Events'] = [json.loads('{%s}' % x) for x in mylist]
Test Code:
import json
mylist = ['"Event" : "BATCHS01-wbstp01"', '"Event" : "BATCHS01-wbstrt01"']
tmpdict = {}
tmpdict['Events'] = [json.loads('{%s}' % x) for x in mylist]
with open('test.json', 'w') as fp:
json.dump(tmpdict, fp, indent=4, sort_keys=False)
Results:
{
"Events": [
{
"Event": "BATCHS01-wbstp01"
},
{
"Event": "BATCHS01-wbstrt01"
}
]
}

Related

How to Convert Text in Non-structured Format to JSON Format

Content of a Sample Input Text
{'key1':'value1','msg1':"content1"} //line 1
{'key2':'value2','msg2':"content2"} //line 2
{'key3':'value3','msg3':"content3"} //line 3
Also, pointing out some notable characteristics of the input text
Lacks a proper delimiter, currently each object {...} takes a new line "\n"
Contains single quotes, which can be an issue since JSON (the expected output) accepts only double quotes
Does not have the opening and closing curly brackets required by JSON
Expected Output JSON
{
{
"key1":"value1",
"msg1":"content1"
},
{
"key2":"value2",
"msg2":"content2"
},
{
"key3":"value3",
"msg3":"content3"
}
}
What I have tried, but failed
json.dumps(input_text), but it cannot identify "\n" as the "delimiter"
Appending a comma at the end of each object {...}, but encountered the issue of extra comma when it comes to the last object
If you have one dictionary per line, you can replace newlines with , and enclose the whole in brackets [,] (you get a list of dictionaries).
You can use ast.literal_eval to import your file as list of dictionaries.
Finally export it to json:
import json
import ast
with open("file.txt", "r") as f:
dic_list = ast.literal_eval("[" + f.read().replace('\n',',') + "]")
print(json.dumps(dic_list, indent=4))
Output:
[
{
"key1": "value1",
"msg1": "content1"
},
{
"key2": "value2",
"msg2": "content2"
},
{
"key3": "value3",
"msg3": "content3"
}
]
Just use ast
import ast
with open('test.txt') as f:
data = [ast.literal_eval(l.strip()) for l in f.readlines()]
print(data)
output
[{'key1': 'value1', 'msg1': 'content1'}, {'key2': 'value2', 'msg2': 'content2'}, {'key3': 'value3', 'msg3': 'content3'}]

convert json to csv and store it in a variable in python

I am using csv module to convert json to csv and store it in a file or print it to stdout.
def write_csv(data:list, header:list, path:str=None):
# data is json format data as list
output_file = open(path, 'w') if path else sys.stdout
out = csv.writer(output_file)
out.writerow(header)
for row in data:
out.writerow([row[attr] for attr in header])
if path: output_file.close()
I want to store the converted csv to a variable instead of sending it to a file or stdout.
say I want to create a function like this:
def json_to_csv(data:list, header:list):
# convert json data into csv string
return string_csv
NOTE: format of data is simple
data is list of dictionaries of string to string maping
[
{
"username":"srbcheema",
"name":"Sarbjit Singh"
},
{
"username":"testing",
"name":"Test, user"
}
]
I want csv output to look like:
username,name
srbcheema,Sarbjit Singh
testing,"Test, user"
Converting JSON to CSV is not a trivial operation. There is also no standardized way to translate between them...
For example
my_json = {
"one": 1,
"two": 2,
"three": {
"nested": "structure"
}
}
Could be represented in a number of ways...
These are all (to my knowledge) valid CSVs that contain all the information from the JSON structure.
data
'{"one": 1, "two": 2, "three": {"nested": "structure"}}'
one,two,three
1,2,'{"nested": "structure"}'
one,two,three__nested
1,2,structure
In essence, you will have to figure out the best translation between the two based on your knowledge of the data. There is no right answer on how to go about this.
I'm relatively knew to Python so there's probably a better way, but this works:
def get_safe_string(string):
return '"'+string+'"' if "," in string else string
def json_to_csv(data):
csv_keys = data[0].keys()
header = ",".join(csv_keys)
res = list(",".join(get_safe_string(row.get(k)) for k in csv_keys) for row in data)
res.insert(0,header)
return "\n".join(r for r in res)

Convert Array of JSON Objects to CSV - Python [duplicate]

This question already has answers here:
How to read a JSON file containing multiple root elements?
(4 answers)
Closed 4 years ago.
I have converted a simple JSON to CSV successfully.
I am facing issue , when the file contains Array of JSON Objects.
I am using csv module not pandas for converting.
Please refer the content below which is getting processed successfully and which is failing :
Sucess (When the file contains single list/array of json object ):
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
Fail :
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
The json.loads function is throwing exception as follows :
Extra data ; line 1 column 6789 (char 1234)
How can to process such files ?
EDIT :
This file is flushed using Kinesis Firehorse and pushed to S3.
I am using lambda to download the file and load it and transform.
so it is not a .json file.
Parse each line like so:
with open('input.json') as f:
for line in f:
obj = json.loads(line)
Because your file is not valid JSON. You have to read your file line-by-line and then convert each line individually to object.
Or, you can convert your file structure like this...
[
{
"value": 0.97,
"key_1": "value1",
"key_2": "value2",
"key_3": "value3",
"key_11": "2019-01-01T00:05:00Z"
},
{
"value": 0.97,
"key_1": "value1",
"key_2": "value2",
"key_3": "value3",
"key_11": "2019-01-01T00:05:00Z"
},
{
"value": 0.97,
"key_1": "value1",
"key_2": "value2",
"key_3": "value3",
"key_11": "2019-01-01T00:05:00Z"
}
]
and it will be a valid JSON file.
As tanaydin said, your failing input is not valid json. It should look something like this:
[
{
"value":0.97,
"key_1":"value1",
"key_2":"value2",
"key_3":"value3",
"key_11":"2019-01-01T00:05:00Z"
},
{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"},
{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}
]
I assume you're creating the json output by iterating over a list of objects and calling json.dumps on each one. You should create your list of dictionaries, then call json.dumps on the whole list instead.
list_of_dicts_to_jsonify = {}
object_attributes = ['value', 'key_1', 'key_2', 'key_3', 'key_11']
for item in list_of_objects:
# Convert object to dictionary
obj_dict = {}
for k in object_attributes:
obj_dict[k] = getattr(item, k) or None
list_of_dicts_to_jsonify.append(obj_dict)
json_output = json.dumps(list_of_dicts_to_jsonify)

Multiple jsons to csv

I have multiple files, each containing multiple highly nested json rows. The two first rows of one such file look like:
{
"u":"28",
"evv":{
"w":{
"1":400,
"2":{
"i":[{
"l":14,
"c":"7",
"p":"4"
}
]
}
}
}
}
{
"u":"29",
"evv":{
"w":{
"3":400,
"2":{
"i":[{
"c":14,
"y":"7",
"z":"4"
}
]
}
}
}
}
they are actually rows, I just wrote them here this way for more visibility.
My question is the following:
Is there any way to convert all these files to one (or multiple, i.e. one per file) csv/excel... ?
Is there any simple way, that doesn't require writing dozens, or hundreds of lines in Python, specific to my file, to convert all these files to one (or multiple, i.e. one per file) csv/excel... ? One example would be using an external library, script... that handles this particular task, regardless of the names of the fields.
The trap is that some elements do not appear in each line. For example, for the "i" key, we have 3 fields (l, c, p) in the first json, and 3 in the second one (c, y, z). Ideally, the csv should contain as many columns as possible fields (e.g. evv.w.2.i.l, evv.w.2.i.c, evv.w.2.i.p, evv.w.2.i.y, evv.w.2.i.z) at the risk of having (many) null values per csv row.
A possible csv output for this example would have the following columns:
u, evv.w.1, evv.w.3, evv.w.2.i.l, evv.w.2.i.c, evv.w.2.i.p, evv.w.2.i.y, evv.w.2.i.z
Any idea/reference is welcome :)
Thanks
No, there is no general-purpose program that does precisely what you ask for.
You can, however, write a Python program that does it.
This program might do what you want. It does not have any code specific to your key names, but it is specific to your file format.
It can take several files on the command line.
Each file is presumed to have one JSON object per line.
It flattens the JSON object, joining labels with "."
import fileinput
import json
import csv
def flattify(d, key=()):
if isinstance(d, list):
result = {}
for i in d:
result.update(flattify(i, key))
return result
if isinstance(d, dict):
result = {}
for k, v in d.items():
result.update(flattify(v, key + (k,)))
return result
return {key: d}
total = []
for line in fileinput.input():
if(line.strip()):
line = json.loads(line)
line = flattify(line)
line = {'.'.join(k): v for k, v in line.items()}
total.append(line)
keys = set()
for d in total:
keys.update(d)
with open('result.csv', 'w') as output_file:
output_file = csv.DictWriter(output_file, sorted(keys))
output_file.writeheader()
output_file.writerows(total)
Please check if this (python3) solution works for you.
import json
import csv
with open('test.json') as data_file:
with open('output.csv', 'w', newline='') as fp:
for line in data_file:
data = json.loads(line)
output = [[data['u'], data['evv']['w'].get('1'), data['evv']['w'].get('3'),
data['evv']['w'].get('2')['i'][0].get('l'), data['evv']['w'].get('2')['i'][0].get('c'),
data['evv']['w'].get('2')['i'][0].get('p'), data['evv']['w'].get('2')['i'][0].get('y'),
data['evv']['w'].get('2')['i'][0].get('z')]]
a = csv.writer(fp, delimiter=',')
a.writerows(output)
test.json
{ "u": "28", "evv": { "w": { "1": 400, "2": { "i": [{ "l": 14, "c": "7", "p": "4" }] } } }}
{"u":"29","evv":{ "w":{ "3":400, "2":{ "i":[{ "c":14, "y":"7", "z":"4" } ] } } }}
output
python3 pyprog.py
dac#dac-Latitude-E7450 ~/P/pyprog> more output.csv
28,400,,14,7,4,,
29,,400,,14,,7,4

Add lines from file as dictionary to list Python

Im trying to import a file which contains lines like this:
{ "dictitem" : 1, "anotherdictitem" : 2 }
I want to import them in to a list of dictionaries like this:
[{ "dictitem" : "henry", "anotherdictitem" : 2 },{ "dictitem" : "peter", "anotherdictitem" : 4 },{ "dictitem" : "anna", "anotherdictitem" : 6 }]
I tried this: tweetlist = open("sample.out").readlines()
But then they get appended as a string. Does anyone have an idea?
Thanks!
You need to decode each line using json. Example:
import json
list = []
with open("data.txt", 'r') as file:
for line in file:
dict = json.loads(line)
list.append(dict)
print(list)
You can use the AST library's literal_eval function on each line by using a list comprehension like this:
import ast
tweetlist = [ast.literal_eval(x) for x in open("sample.out").readlines()]
ast.literal_eval is a safer version of the eval function that doesn't execute functions.

Categories