Convert CSV to Nested JSON using / header delimiter - python

My CSV headers look something like
from/email
to/0/email
personalization/0/email/
personalization/0/data/first_name
personalization/0/data/company_name
personalization/0/data/job_title
template_id
Output should be:
[
{
"from": {
"email": "me#x.com",
"name": "Me"
},
"to": [
{
"email": "mike#x.com"
}
],
"personalization": [
{
"email": "mike#x.com",
"data": {
"first_name": "Mike",
"company_name": "X.com",
"job_title": "Chef"
}
}
],
"template_id": "123456"
},
I tried
csvjson input.csv output.csv
csvtojson input.csv output.csv
csv2json input.csv output.csv
python3 app.py
import csv
import json
def csv_to_json(csvFilePath, jsonFilePath):
jsonArray = []
#read csv file
with open(csvFilePath, encoding='utf-8') as csvf:
#load csv file data using csv library's dictionary reader
csvReader = csv.DictReader(csvf)
#convert each csv row into python dict
for row in csvReader:
#add this python dict to json array
jsonArray.append(row)
#convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(jsonArray, indent=4)
jsonf.write(jsonString)
csvFilePath = r'outputt1.csv'
jsonFilePath = r'outputt1.json'
csv_to_json(csvFilePath, jsonFilePath)
node app.js
const CSVToJSON = require('csvtojson');
// convert users.csv file to JSON array
CSVToJSON().fromFile('outputt1.csv')
.then(from => {
// from is a JSON array
// log the JSON array
console.log(from);
}).catch(err => {
// log error if any
console.log(err);
});
All output some variation of single-line JSON with no nesting.
The only thing that worked was uploading it to https://www.convertcsv.com/csv-to-json.htm and converting each file by hand, but that is obviously not a solution.
I have seen a post recommending Choetl.Json for this exact purpose but was unable to install it on mac

Your problem should be broken down into two parts: parsing CSV data for conversion into JSON, and building a JSON structure following path-like descriptors.
For the first part, it is necessary to clarify the formatting of the CSV input, as there is no general standard for CSV, just a fundamental description in the RFC 4180 proposal and a lot of adoptions tailored to specific use cases or data types. As you didn't provide any actual CSV content, let's assume for the sake of simplicity that records are separated by newlines, fields are separated by commas, and field delimiters are not present as the data itself never contains any of these separators. Let's further assume that all records have the exact same number of fields, and that exactly one of them (namely the first) represents the headers. You may want to adjust these assumptions to your actual CSV data.
cat input.csv
from/email,to/0/email,personalization/0/email,personalization/0/data/first_name,personalization/0/data/company_name,personalization/0/data/job_title,template_id
me#x.com,mike#x.com,mike#x.com,Mike,X.com,Chef,123456
Based on this formatting, you can read in the CSV data using the --raw-input or -R option which streams in each newline-separated segment of raw text as a JSON string input. Ideally, your filter should then convert each input string record into an array of string fields by splitting at the comma, e.g. using the / operator:
jq -R '. / ","' input.csv
[
"from/email",
"to/0/email",
"personalization/0/email",
"personalization/0/data/first_name",
"personalization/0/data/company_name",
"personalization/0/data/job_title",
"template_id"
]
[
"me#x.com",
"mike#x.com",
"mike#x.com",
"Mike",
"X.com",
"Chef",
"123456"
]
Demo
As for the second part, you can now easily process these JSON arrays. In order to treat the first one (the headers) separately, you could use the --slurp or -s option which turns the input stream into an array whose elements can then be accessed using indices. Also, the setpath builtin comes in handy as it can set values within a JSON structure described as an array of strings and integers representing object fields and array indices, just as you do in your headers. This leaves you turning the header strings into such arrays by splitting at "/" and converting number-like segments into actual numbers. Finally, to successively build up your JSON objects you could iterate through the record fields using a reduce statement and align the record fields to their corresponding header fields using transpose:
… | jq -s '
(.[0] | map(. / "/" | map(tonumber? // .))) as $headers
| .[1:] | map(
reduce ([$headers, .] | transpose[]) as [$path, $value] (
{}; setpath($path; $value)
)
)
'
[
{
"from": {
"email": "me#x.com"
},
"to": [
{
"email": "mike#x.com"
}
],
"personalization": [
{
"email": "mike#x.com",
"data": {
"first_name": "Mike",
"company_name": "X.com",
"job_title": "Chef"
}
}
],
"template_id": "123456"
}
]
Demo
Notes
My showcase disregards the fact that your sample JSON output also provides an additional field name under the top-level field from because your sample CSV input headers don't include a matching field from/name
To emphasize the bipartite nature of this approach, I concluded with two cascading invocations of jq. This generally could (and mostly should) be combined into one. However, as combining the options --raw-input and --slurp would alter jq's read-in behaviour, you'd rather want to add the --null-input or -n option with [inputs | …] in the first filter which lets you dismiss the --slurp option in the second: jq -Rn '[inputs / "/"] | …' (Demo)

Related

convert json to csv and store it in a variable in python

I am using csv module to convert json to csv and store it in a file or print it to stdout.
def write_csv(data:list, header:list, path:str=None):
# data is json format data as list
output_file = open(path, 'w') if path else sys.stdout
out = csv.writer(output_file)
out.writerow(header)
for row in data:
out.writerow([row[attr] for attr in header])
if path: output_file.close()
I want to store the converted csv to a variable instead of sending it to a file or stdout.
say I want to create a function like this:
def json_to_csv(data:list, header:list):
# convert json data into csv string
return string_csv
NOTE: format of data is simple
data is list of dictionaries of string to string maping
[
{
"username":"srbcheema",
"name":"Sarbjit Singh"
},
{
"username":"testing",
"name":"Test, user"
}
]
I want csv output to look like:
username,name
srbcheema,Sarbjit Singh
testing,"Test, user"
Converting JSON to CSV is not a trivial operation. There is also no standardized way to translate between them...
For example
my_json = {
"one": 1,
"two": 2,
"three": {
"nested": "structure"
}
}
Could be represented in a number of ways...
These are all (to my knowledge) valid CSVs that contain all the information from the JSON structure.
data
'{"one": 1, "two": 2, "three": {"nested": "structure"}}'
one,two,three
1,2,'{"nested": "structure"}'
one,two,three__nested
1,2,structure
In essence, you will have to figure out the best translation between the two based on your knowledge of the data. There is no right answer on how to go about this.
I'm relatively knew to Python so there's probably a better way, but this works:
def get_safe_string(string):
return '"'+string+'"' if "," in string else string
def json_to_csv(data):
csv_keys = data[0].keys()
header = ",".join(csv_keys)
res = list(",".join(get_safe_string(row.get(k)) for k in csv_keys) for row in data)
res.insert(0,header)
return "\n".join(r for r in res)

Invalid Json using json.dump in python3

My generated json output is showing that it's not a valid Json while checking with jslint. Getting error EOF.
Here am using if len(data) != 0: for not inserting [] in the final output.json file (working but don't know any other way to avoid inserting [] to file)
with open('output.json', 'a') as jsonFile:
print(data)
if len(data) != 0:
json.dump(data, jsonFile, indent=2)
My input data is coming one by one from another function generated from inside for loop.
Sample "data" coming from another function using loop :
print(data)
[{'product': 'food'}, {'price': '$100'}]
[{'product': 'clothing'}, {'price': '$40'}]
...
Can I append these data and make a json file under "Store". What should be the the proper practice. Please suggest.
Sample output generated from output.json file :
[
{
"product": "food"
},
{
"price": "$100"
}
][
{
"product": "clothing"
},
{
"price": "$40"
}
]
Try jsonlines package, you would need to install it using pip install jsonlines.
jsonlines does not contain the comma(,) at the end of line. So you can read and write exact structure the way you have anod you would not need to do any additional merge or formatting.
import jsonlines
with jsonlines.open('output.json') as reader:
for obj in reader:
// Do something with obj
Similarly, you can do the dump but by write method of this module.
with jsonlines.open('output.json', mode='w') as writer:
writer.write(...)
output.jsonl would look like this
[{'product': 'food'}, {'price': '$100'}]
[{'product': 'clothing'}, {'price': '$40'}]
Yes, You can always club them all together and link it to a key named Store which would make sense as they are all the products in the store.
But I think the below format would be much better as each product in the store have a defined product name along with the price of that product
{
"Store":[
{
"product":"food",
"price":"$100"
},
{
"product":"clothing",
"price":"$40"
}
]
}
If you do this way you need not have to insert each and every key,value pair to the json but instead if you can simply insert the entire product name and price to a single object and keep appending it to the store list

Convert Array of JSON Objects to CSV - Python [duplicate]

This question already has answers here:
How to read a JSON file containing multiple root elements?
(4 answers)
Closed 4 years ago.
I have converted a simple JSON to CSV successfully.
I am facing issue , when the file contains Array of JSON Objects.
I am using csv module not pandas for converting.
Please refer the content below which is getting processed successfully and which is failing :
Sucess (When the file contains single list/array of json object ):
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
Fail :
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
The json.loads function is throwing exception as follows :
Extra data ; line 1 column 6789 (char 1234)
How can to process such files ?
EDIT :
This file is flushed using Kinesis Firehorse and pushed to S3.
I am using lambda to download the file and load it and transform.
so it is not a .json file.
Parse each line like so:
with open('input.json') as f:
for line in f:
obj = json.loads(line)
Because your file is not valid JSON. You have to read your file line-by-line and then convert each line individually to object.
Or, you can convert your file structure like this...
[
{
"value": 0.97,
"key_1": "value1",
"key_2": "value2",
"key_3": "value3",
"key_11": "2019-01-01T00:05:00Z"
},
{
"value": 0.97,
"key_1": "value1",
"key_2": "value2",
"key_3": "value3",
"key_11": "2019-01-01T00:05:00Z"
},
{
"value": 0.97,
"key_1": "value1",
"key_2": "value2",
"key_3": "value3",
"key_11": "2019-01-01T00:05:00Z"
}
]
and it will be a valid JSON file.
As tanaydin said, your failing input is not valid json. It should look something like this:
[
{
"value":0.97,
"key_1":"value1",
"key_2":"value2",
"key_3":"value3",
"key_11":"2019-01-01T00:05:00Z"
},
{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"},
{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}
]
I assume you're creating the json output by iterating over a list of objects and calling json.dumps on each one. You should create your list of dictionaries, then call json.dumps on the whole list instead.
list_of_dicts_to_jsonify = {}
object_attributes = ['value', 'key_1', 'key_2', 'key_3', 'key_11']
for item in list_of_objects:
# Convert object to dictionary
obj_dict = {}
for k in object_attributes:
obj_dict[k] = getattr(item, k) or None
list_of_dicts_to_jsonify.append(obj_dict)
json_output = json.dumps(list_of_dicts_to_jsonify)

Python JSON parser error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I need some help parsing JSON file. I've tried a couple of different ways to get the data I need. Below is a sample of the code and also a section of the JSON data but when I run the code I get the error listed above.
There's 500K lines of text in the JSON and it first fails about about 1400 lines in and I can't see anything in that area section to indicate why.
I've run it successfully by only checking blocks of JSON up to the first 1400 lines and I've used a different parser and got the same error.
I'm debating if it's an error in the code, an error in the JSON or a result of the JSON being made of different kids of data as some (like the example below) is for a forklift and others for fixed machines but it is all structured just like below.
All help sincerely appreciate.
Code:
import json
file_list = ['filename.txt'] #insert filename(s) here
for x in range(len(file_list)):
with open(file_list[x], 'r') as f:
distros_dict = json.load(f)
#list the headlines to be parsed
for distro in distros_dict:
print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])
And here is a section of the JSON:
{
"id": "b4994c877c9c",
"name": "Trukki_0001",
"areaId": "Tracking001",
"areaName": "Ajoneuvo",
"color": "#FF0000",
"coordinateSystemId": "CoordSys001",
"coordinateSystemName": null,
"covarianceMatrix": [
0.47,
0.06,
0.06,
0.61
],
"position": [
33.86,
33.07,
2.15
],
"positionAccuracy": 0.36,
"positionTS": 1489363199493,
"smoothedPosition": [
33.96,
33.13,
2.15
],
"zones": [
{
"id": "Zone001",
"name": "Halli1"
}
],
"direction": [
0,
0,
0
],
"collisionId": null,
"restrictedArea": "",
"tagType": "VEHICLE_MANNED",
"drivenVehicleId": null,
"drivenByEmployeeIds": null,
"simpleXY": "33|33",
"EventProcessedUtcTime": "2017-03-13T00:00:00.3175072Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2017-03-13T00:00:00.0470000Z"
}
The actual problem was that the JSON file was coded in UTF not ASCII. If you change the encoding using something like notepad++ then it will be solved.
Using the file provided I got it to work by changing "distros_dict" to a list. In you code you assign distros_dict not add to it, so if more than 1 file were to be read it would assign it to the last one.
This is my implementation
import json
file_list = ['filename.txt'] #insert filename(s) here
distros_list = []
for x in range(len(file_list)):
with open(file_list[x], 'r') as f:
distros_list.append(json.load(f))
#list the headlines to be parsed
for distro in distros_list:
print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])
You will be left with a list of dictionaries
I'm guessing that your JSON is actually a list of objects, i.e. the whole stream looks like:
[
{ x:1, y:2 },
{ x:3, y:4 },
...
]
... with each element being structured like the section you provided above. This is perfectly valid JSON, and if I store it in a file named file.txt and paste your snippet between a set of [ ], thus making it a list, I can parse it in Python. Note, however, that the result will be again a Python list, not a dict, so you'd iterate like this over each list-item:
import json
import pprint
file_list = ['file.txt']
# Just iterate over the file-list like this, no need for range()
for x in file_list:
with open(x, 'r') as f:
# distros is a list!
distros = json.load(f)
for distro in distros:
print(distro['name'])
print(distro['positionTS'])
print(distro['smoothedPosition'][1])
pprint.pprint(distro)
Edit: I moved the second for-loop into the loop over the files. This seems to make more sense, as otherwise you'll iterate once over all files, store the last one in distros, then print elements only from the last one. By nesting the loops, you'll iterate over all files, and for each file iterate over all elements in the list. Hat-tip to the commenters for pointing this out!

Multiple jsons to csv

I have multiple files, each containing multiple highly nested json rows. The two first rows of one such file look like:
{
"u":"28",
"evv":{
"w":{
"1":400,
"2":{
"i":[{
"l":14,
"c":"7",
"p":"4"
}
]
}
}
}
}
{
"u":"29",
"evv":{
"w":{
"3":400,
"2":{
"i":[{
"c":14,
"y":"7",
"z":"4"
}
]
}
}
}
}
they are actually rows, I just wrote them here this way for more visibility.
My question is the following:
Is there any way to convert all these files to one (or multiple, i.e. one per file) csv/excel... ?
Is there any simple way, that doesn't require writing dozens, or hundreds of lines in Python, specific to my file, to convert all these files to one (or multiple, i.e. one per file) csv/excel... ? One example would be using an external library, script... that handles this particular task, regardless of the names of the fields.
The trap is that some elements do not appear in each line. For example, for the "i" key, we have 3 fields (l, c, p) in the first json, and 3 in the second one (c, y, z). Ideally, the csv should contain as many columns as possible fields (e.g. evv.w.2.i.l, evv.w.2.i.c, evv.w.2.i.p, evv.w.2.i.y, evv.w.2.i.z) at the risk of having (many) null values per csv row.
A possible csv output for this example would have the following columns:
u, evv.w.1, evv.w.3, evv.w.2.i.l, evv.w.2.i.c, evv.w.2.i.p, evv.w.2.i.y, evv.w.2.i.z
Any idea/reference is welcome :)
Thanks
No, there is no general-purpose program that does precisely what you ask for.
You can, however, write a Python program that does it.
This program might do what you want. It does not have any code specific to your key names, but it is specific to your file format.
It can take several files on the command line.
Each file is presumed to have one JSON object per line.
It flattens the JSON object, joining labels with "."
import fileinput
import json
import csv
def flattify(d, key=()):
if isinstance(d, list):
result = {}
for i in d:
result.update(flattify(i, key))
return result
if isinstance(d, dict):
result = {}
for k, v in d.items():
result.update(flattify(v, key + (k,)))
return result
return {key: d}
total = []
for line in fileinput.input():
if(line.strip()):
line = json.loads(line)
line = flattify(line)
line = {'.'.join(k): v for k, v in line.items()}
total.append(line)
keys = set()
for d in total:
keys.update(d)
with open('result.csv', 'w') as output_file:
output_file = csv.DictWriter(output_file, sorted(keys))
output_file.writeheader()
output_file.writerows(total)
Please check if this (python3) solution works for you.
import json
import csv
with open('test.json') as data_file:
with open('output.csv', 'w', newline='') as fp:
for line in data_file:
data = json.loads(line)
output = [[data['u'], data['evv']['w'].get('1'), data['evv']['w'].get('3'),
data['evv']['w'].get('2')['i'][0].get('l'), data['evv']['w'].get('2')['i'][0].get('c'),
data['evv']['w'].get('2')['i'][0].get('p'), data['evv']['w'].get('2')['i'][0].get('y'),
data['evv']['w'].get('2')['i'][0].get('z')]]
a = csv.writer(fp, delimiter=',')
a.writerows(output)
test.json
{ "u": "28", "evv": { "w": { "1": 400, "2": { "i": [{ "l": 14, "c": "7", "p": "4" }] } } }}
{"u":"29","evv":{ "w":{ "3":400, "2":{ "i":[{ "c":14, "y":"7", "z":"4" } ] } } }}
output
python3 pyprog.py
dac#dac-Latitude-E7450 ~/P/pyprog> more output.csv
28,400,,14,7,4,,
29,,400,,14,,7,4

Categories