Flatten nested json to csv with nested column names - python

I have rather very weird requirement now. I have below json and somehow I have to convert it into flat csv.
[
{
"authorizationQualifier": "SDA",
"authorizationInformation": " ",
"securityQualifier": "ASD",
"securityInformation": " ",
"senderQualifier": "ASDAD",
"senderId": "FADA ",
"receiverQualifier": "ADSAS",
"receiverId": "ADAD ",
"date": "140101",
"time": "0730",
"standardsId": null,
"version": "00501",
"interchangeControlNumber": "123456789",
"acknowledgmentRequested": "0",
"testIndicator": "T",
"functionalGroups": [
{
"functionalIdentifierCode": "ADSAD",
"applicationSenderCode": "ASDAD",
"applicationReceiverCode": "ADSADS",
"date": "20140101",
"time": "07294900",
"groupControlNumber": "123456789",
"responsibleAgencyCode": "X",
"version": "005010X221A1",
"transactions": [
{
"name": "ASDADAD",
"transactionSetIdentifierCode": "adADS",
"transactionSetControlNumber": "123456789",
"implementationConventionReference": null,
"segments": [
{
"BPR03": "ad",
"BPR14": "QWQWDQ",
"BPR02": "1.57",
"BPR13": "23223",
"BPR01": "sad",
"BPR12": "56",
"BPR10": "32424",
"BPR09": "12313",
"BPR08": "DA",
"BPR07": "123456789",
"BPR06": "12313",
"BPR05": "ASDADSAD",
"BPR16": "21313",
"BPR04": "SDADSAS",
"BPR15": "11212",
"id": "aDSASD"
},
{
"TRN02": "2424",
"TRN03": "35435345",
"TRN01": "3435345",
"id": "FSDF"
},
{
"REF02": "fdsffs",
"REF01": "sfsfs",
"id": "fsfdsfd"
},
{
"DTM02": "2432424",
"id": "sfsfd",
"DTM01": "234243"
}
],
"loops": [
{
"id": "24324234234",
"segments": [
{
"N101": "sfsfsdf",
"N102": "sfsf",
"id": "dgfdgf"
},
{
"N301": "sfdssfdsfsf",
"N302": "effdssf",
"id": "fdssf"
},
{
"N401": "sdffssf",
"id": "sfds",
"N402": "sfdsf",
"N403": "23424"
},
{
"PER06": "Wsfsfdsfsf",
"PER05": "sfsf",
"PER04": "23424",
"PER03": "fdfbvcb",
"PER02": "Pedsdsf",
"PER01": "sfsfsf",
"id": "fdsdf"
}
]
},
{
"id": "2342",
"segments": [
{
"N101": "sdfsfds",
"N102": "vcbvcb",
"N103": "dsfsdfs",
"N104": "343443",
"id": "fdgfdg"
},
{
"N401": "dfsgdfg",
"id": "dfgdgdf",
"N402": "dgdgdg",
"N403": "234244"
},
{
"REF02": "23423342",
"REF01": "fsdfs",
"id": "sfdsfds"
}
]
}
]
}
]
}
]
}
]
The column header name corresponding to deeper key-value make take nested form, like functionalGroups[0].transactions[0].segments[0].BPR15.
I am able to do this in java using this github project (here you can find the output format I desire in the explanation) in one line:
flatJson = JSONFlattener.parseJson(new File("files/simple.json"), "UTF-8");
The output was:
date,securityQualifier,testIndicator,functionalGroups[1].functionalIdentifierCode,functionalGroups[1].date,functionalGroups[1].applicationReceiverCode, ...
140101,00,T,HP,20140101,ETIN,...
But I want to do this in python. I tried as suggested in this answer:
with open('data.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, record_prefix=True)
with open('temp2.csv', "w", newline='\n') as csv_file:
csv_file.write(df.to_csv())
However, for column functionalGroups, it dumps json as a cell value.
I also tried as suggested in this answer:
with open('data.json') as f: # this ensures opening and closing file
a = json.loads(f.read())
df = pandas.DataFrame(a)
print(df.transpose())
But this also seem to do the same:
0
acknowledgmentRequested 0
authorizationInformation
authorizationQualifier SDA
date 140101
functionalGroups [{'functionalIdentifierCode': 'ADSAD', 'applic...
interchangeControlNumber 123456789
receiverId ADAD
receiverQualifier ADSAS
securityInformation
securityQualifier ASD
senderId FADA
senderQualifier ASDAD
standardsId None
testIndicator T
time 0730
version 00501
Is it possible to do what I desire in python?

Related

Using pandas to convert csv into nested json with dynamic strucutre

I am new to python and now want to convert a csv file into json file. Basically the json file is nested with dynamic structure, the structure will be defined using the csv header.
From csv input:
ID, Name, person_id/id_type, person_id/id_value,person_id_expiry_date,additional_info/0/name,additional_info/0/value,additional_info/1/name,additional_info/1/value,salary_info/details/0/grade,salary_info/details/0/payment,salary_info/details/0/amount,salary_info/details/1/next_promotion
1,Peter,PASSPORT,A452817,1-01-2055,Age,19,Gender,M,Manager,Monthly,8956.23,unknown
2,Jane,PASSPORT,B859804,2-01-2035,Age,38,Gender,F,Worker, Monthly,125980.1,unknown
To json output:
[
{
"ID": 1,
"Name": "Peter",
"person_id": {
"id_type": "PASSPORT",
"id_value": "A452817"
},
"person_id_expiry_date": "1-01-2055",
"additional_info": [
{
"name": "Age",
"value": 19
},
{
"name": "Gender",
"value": "M"
}
],
"salary_info": {
"details": [
{
"grade": "Manager",
"payment": "Monthly",
"amount": 8956.23
},
{
"next_promotion": "unknown"
}
]
}
},
{
"ID": 2,
"Name": "Jane",
"person_id": {
"id_type": "PASSPORT",
"id_value": "B859804"
},
"person_id_expiry_date": "2-01-2035",
"additional_info": [
{
"name": "Age",
"value": 38
},
{
"name": "Gender",
"value": "F"
}
],
"salary_info": {
"details": [
{
"grade": "Worker",
"payment": " Monthly",
"amount": 125980.1
},
{
"next_promotion": "unknown"
}
]
}
}
]
Is this something can be done by the existing pandas API or I have to write lots of complex codes to dynamically construct the json object? Thanks.

Converting pandas dataframe to desired dictionary of choice

I am new to converting pandas dataframe into json object.
I have a data frame:
Expected json output after conversion is this.
{
"Name": {
"id": "Max",
},
"Favorites" : [
{
"id":"Apple",
"priority":"High",
"Count":"4"
},
{
"id":"Oranges",
"priority":"Medium",
"Count":"2"
},
{
"id":"Banana",
"priority":"Low",
"Count":"1"
}
]
}
Here's a freebie. Hope it helps you learn how to write it yourself in the future :)
output = []
for index, row in df.iterrows():
entry = {
"Name": {
"id": row['Names']
},
"Favorites": [
{
"id": row['High_Priority_Goods_Name'],
"priority": "High",
"count": row['High_Priority_Goods_Count']
},
{
"id": row['Medium_Priority_Goods_Name'],
"priority": "Medium",
"count": row['Medium_Priority_Goods_Count']
},
{
"id": row['Low_Priority_Goods_Name'],
"priority": "Low",
"count": row['Low_Priority_Goods_Count']
}
]
}
output.append(entry)
print(output)

how to convert multi valued CSV to Json

I have a csv file with 4 columns data as below.
type,MetalType,Date,Acknowledge
Metal,abc123451,2018-05-26,Success
Metal,abc123452,2018-05-27,Success
Metal,abc123454,2018-05-28,Failure
Iron,abc123455,2018-05-29,Success
Iron,abc123456,2018-05-30,Failure
( I just provided header in the above example data but in my case i dont have header in the data)
how can i convert above csv file to Json in the below format...
1st Column : belongs to --> "type": "Metal"
2nd Column : MetalType: "values" : "value": "abc123451"
3rd column : "Date": "values":"value": "2018-05-26"
4th Column : "Acknowledge": "values":"value": "Success"
and remaining all columns are default values.
As per below format ,
{
"entities": [
{
"id": "XXXXXXX",
"type": "Metal",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "abc123451"
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "2018-05-26"
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "Success"
}
]
}
}
}
}
]
}
Even though jww is right, I built something for you:
I import the csv using pandas:
df = pd.read_csv('data.csv')
then I create a template for the dictionaries you want to add:
d_json = {"entities": []}
template = {
"id": "XXXXXXX",
"type": "",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
}
}
}
}
Now you just need to fill in the dictionary:
for i in range(len(df)):
d = template
d['type'] = df['type'][i]
d['data']['attributes']['MetalType']['values'][0]['value'] = df['MetalType'][i]
d['data']['attributes']['Date']['values'][0]['value'] = df['Date'][i]
d['data']['attributes']['Acknowledge']['values'][0]['value'] = df['Acknowledge'][i]
d_json['entities'].append(d)
I know my way of iterating over the df is kind of ugly, maybe someone knows a cleaner way.
Cheers!

Json to cvs using python

I'm trying to convert json to csv, but the code have **"header"**With my current knowladge I can't covert it into csv, because I don't now hot to handle "headers":
`
{
"__metadata": {
"uri": "http://ip:port/vvv/v1/folders?page=1&pagesize=50"
},
"first": {
"__deferred": {
"uri": "http://ip:port/vvv/v1/folders?page=1&pagesize=50"
}
},
"last": {
"__deferred": {
"uri": "http://ip:port/vvv/v1/folders?page=1&pagesize=50"
}
},
"entries": [
`
And the rest of code looks like this:
`
{
"__metadata": {
"uri": "http://ip:port/vvv/v1/folders/13483"
},
"cuid": "AfbTJW3iTE1MkiLULzA6P58",
"name": "Foldername1",
"description": "",
"id": "13483",
"type": "Folder",
"ownerid": "12",
"updated": "Wed Mar 01 09:14:23 CET 2017"
},
{
"__metadata": {
"uri": "http://ip:port/vvv/v1/folders/523"
},
"cuid": "AS1oZEJAynpNjZIaZK2rc7g",
"name": "foldername2",
"description": "",
"id": "523",
"type": "Folder",
"ownerid": "10",
"updated": "Wed Jan 18 00:11:06 CET 2017"
},
{
"__metadata": {
"uri": "http://ip:port/vvv/v1/folders/5356"
},
"cuid": "AeN4lEu0h_tAtnPEjFYxwi8",
"name": "foldername",
"description": "",
"id": "5356",
"type": "Folder",
"ownerid": "12",
"updated": "Fri Feb 10 17:28:53 CET 2017"
}
]
}
`
How can I convert above code into csv? How I can deal with "header"?
Python's json and csv libraries should handle this for you. Just load the json data in and access the entries tag directly. From there you can enumerate all the data and write it to a csv file.
This example shows how to also write all of the data in dataprovider before writing the expression list:
import json
import csv
data = """{
"dataprovider": {
"id": "DP0",
"name": "Query 1",
"dataSourceId": "5430",
"dataSourcePrefix": "DS0",
"dataSourceType": "unv",
"updated": "2010-12-03T13:07:43.000Z",
"duration": 1,
"isPartial": "false",
"rowCount": 1016,
"flowCount": 1,
"dictionary": {
"expression": [{
"#dataType": "String",
"#qualification": "Dimension",
"id": "DP0.DOa5",
"name": "Lines",
"description": "Product line. Each line contains a set of categories.",
"dataSourceObjectId": "DS0.DOa5",
"formulaLanguageId": "[Lines]"
},
{
"#dataType": "Numeric",
"#qualification": "Measure",
"#highPrecision": "false",
"id": "DP0.DO93",
"name": "Sales revenue",
"description": "Sales revenue $ - $ revenue of SKU sold",
"dataSourceObjectId": "DS0.DO93",
"formulaLanguageId": "[Sales revenue]",
"aggregationFunction": "Sum"
}]
},
"query": "SELECT ... FROM ... WHERE"
}
}
"""
my_json = json.loads(data)
entries = my_json['dataprovider']['dictionary']['expression']
header_1 = my_json['dataprovider'].keys()
header_1.remove("dictionary")
data_1 = [(k, str(my_json['dataprovider'][k])) for k in header_1]
header_2 = sorted(entries[0].keys())
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
# Write initial header information
csv_output.writerows(data_1)
# Write an empty row
csv_output.writerow([])
# Write list information
csv_output.writerow(header_2)
for entry in entries:
csv_output.writerow([' '.join(str(entry.get(col, '')).splitlines()) for col in header_2])
The CSV file would then look something like:
updated,2010-12-03T13:07:43.000Z
name,Query 1
dataSourceType,unv
rowCount,1016
isPartial,false
dataSourceId,5430
query,SELECT ... FROM ... WHERE
duration,1
flowCount,1
dataSourcePrefix,DS0
id,DP0
#dataType,#qualification,dataSourceObjectId,description,formulaLanguageId,id,name
String,Dimension,DS0.DOa5,Product line. Each line contains a set of categories.,[Lines],DP0.DOa5,Lines
Numeric,Measure,DS0.DO93,Sales revenue $ - $ revenue of SKU sold,[Sales revenue],DP0.DO93,Sales revenue
If you are getting different JSON, you need to manually decide which part to extract, for example:
entries = my_json['documents']['document']

How to modify nested JSON with python

I need to update (CRUD) a nested JSON file using Python. To be able to call python function(s)(to update/delete/create) entires and write it back to the json file.
Here is a sample file.
I am looking at the remap library but not sure if this will work.
{
"groups": [
{
"name": "group1",
"properties": [
{
"name": "Test-Key-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value1"
}
},
{
"name": "Test-Key-Integer",
"value": {
"type": "Integer",
"data": 1000
}
}
],
"groups": [
{
"name": "group-child",
"properties": [
{
"name": "Test-Key-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value1"
}
},
{
"name": "Test-Key-Integer",
"value": {
"type": "Integer",
"data": 1000
}
}
]
}
]
},
{
"name": "group2",
"properties": [
{
"name": "Test-Key2-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value2"
}
}
]
}
]
}
I feel like I'm missing something in your question. In any event, what I understand is that you want to read a json file, edit the data as a python object, then write it back out with the updated data?
Read the json file:
import json
with open("data.json") as f:
data = json.load(f)
That creates a dictionary (given the format you've given) that you can manipulate however you want. Assuming you want to write it out:
with open("data.json","w") as f:
json.dump(data,f)

Categories