Read JSON file to dataframe

Read JSON file to dataframe - python

I have a JSON with the following structure below into a list in a python variable. I'd like to extract this JSON value as a table. My question is, how can I extract it from the list and how can I change it into a table?
Once I have converted it, I will insert the output into a SQL table.
JSON structure:
{
"Details": [
{
"List": [
{
"year": "2018-19",
"Title": "PLANTN-OTRWRKS"
},
{
"year": "2018-19",
"Title": "EXTERNAL"
},
{
"year": "2019-20",
"Title": "INTERNAL"
},
{
"year": "2020-21",
"Title": "BIANNUAL"
},
{
"year": "2022-23",
"Title": "WORKS OF 2017-18 AND 2018-19"
}
],
"Wise": [
{
"Circle": "dgrgr",
"ID": "912",
"total_seedlings_evaluated": "5270",
"Average_height_in_meters": "2.53",
"Average_collar_girth_in_cms": "11.32",
"Survival_perc": "86.79",
"Condition": "Very Good"
},
{
"Circle": "hgrj",
"ID": "4654",
"total_seedlings_evaluated": "117206",
"Average_height_in_meters": "3.04",
"Average_collar_girth_in_cms": "22.61",
"Survival_perc": "71.61",
"Condition": "Good"
}
]
}
]
}
Desired Output: Output
I had used Python and following code:
df_table = pd.json_normalize(jsonObj['Details'])
df_table1 = pd.DataFrame(df_table['Wise'],index=df_table.index)
But it didn't work.

Related

Using pandas to convert csv into nested json with dynamic strucutre

I am new to python and now want to convert a csv file into json file. Basically the json file is nested with dynamic structure, the structure will be defined using the csv header.
From csv input:
ID, Name, person_id/id_type, person_id/id_value,person_id_expiry_date,additional_info/0/name,additional_info/0/value,additional_info/1/name,additional_info/1/value,salary_info/details/0/grade,salary_info/details/0/payment,salary_info/details/0/amount,salary_info/details/1/next_promotion
1,Peter,PASSPORT,A452817,1-01-2055,Age,19,Gender,M,Manager,Monthly,8956.23,unknown
2,Jane,PASSPORT,B859804,2-01-2035,Age,38,Gender,F,Worker, Monthly,125980.1,unknown
To json output:
[
{
"ID": 1,
"Name": "Peter",
"person_id": {
"id_type": "PASSPORT",
"id_value": "A452817"
},
"person_id_expiry_date": "1-01-2055",
"additional_info": [
{
"name": "Age",
"value": 19
},
{
"name": "Gender",
"value": "M"
}
],
"salary_info": {
"details": [
{
"grade": "Manager",
"payment": "Monthly",
"amount": 8956.23
},
{
"next_promotion": "unknown"
}
]
}
},
{
"ID": 2,
"Name": "Jane",
"person_id": {
"id_type": "PASSPORT",
"id_value": "B859804"
},
"person_id_expiry_date": "2-01-2035",
"additional_info": [
{
"name": "Age",
"value": 38
},
{
"name": "Gender",
"value": "F"
}
],
"salary_info": {
"details": [
{
"grade": "Worker",
"payment": " Monthly",
"amount": 125980.1
},
{
"next_promotion": "unknown"
}
]
}
}
]
Is this something can be done by the existing pandas API or I have to write lots of complex codes to dynamically construct the json object? Thanks.

replace nested document array mongodb with python

i have this document in mongodb
{
"_id": {
"$oid": "62644af0368cb0a46d7c2a95"
},
"insertionData": "23/04/2022 19:50:50",
"ipfsMetadata": {
"Name": "data.json",
"Hash": "Qmb3FWgyJHzJA7WCBX1phgkV93GiEQ9UDWUYffDqUCbe7E",
"Size": "431"
},
"metadata": {
"sessionDate": "20220415 17:42:55",
"dataSender": "user345",
"data": {
"height": "180",
"weight": "80"
},
"addtionalInformation": [
{
"name": "poolsize",
"value": "30m"
},
{
"name": "swimStyle",
"value": "mariposa"
},
{
"name": "modality",
"value": "swim"
},
{
"name": "gender-title",
"value": "schoolA"
}
]
},
"fileId": {
"$numberLong": "4"
}
}
I want to update nested array document, for instance the name with gender-tittle. This have value schoolA and i want to change to adult like the body. I give the parameter number of fileId in the post request and in body i pass this
post request : localhost/sessionUpdate/4
and body:
{
"name": "gender-title",
"value": "adultos"
}
flask
#app.route('/sessionUpdate/<string:a>', methods=['PUT'])
def sessionUpdate(a):
datas=request.json
r=str(datas['name'])
r2=str(datas['value'])
print(r,r2)
r3=collection.update_one({'fileId':a, 'metadata.addtionalInformation':r}, {'$set':{'metadata.addtionalInformation.$.value':r2}})
return str(r3),200
i'm getting the 200 but the document don't update with the new value.

As you are using positional operator $ to work with your array, make sure your select query is targeting array element. You can see in below query that it is targeting metadata.addtionalInformation array with the condition that name: "gender-title"
db.collection.update({
"fileId": 4,
"metadata.addtionalInformation.name": "gender-title"
},
{
"$set": {
"metadata.addtionalInformation.$.value": "junior"
}
})
Here is the Mongo playground for your reference.

How to read fields without numeric index in JSON

I have a json file where I need to read it in a structured way to insert in a database each value in its respective column, but in the tag "customFields" the fields change index, example: "Tribe / Customer" can be index 0 (row['customFields'][0]) in a json block, and in the other one be index 3 (row['customFields'][3]), so I tried to read the data using the name of the row field ['customFields'] ['Tribe / Customer'], but I got the error below:
TypeError: list indices must be integers or slices, not str
Script:
def getCustomField(ModelData):
for row in ModelData["data"]["squads"][0]["cards"]:
print(row['identifier'],
row['customFields']['Tribe / Customer'],
row['customFields']['Stopped with'],
row['customFields']['Sub-Activity'],
row['customFields']['Activity'],
row['customFields']['Complexity'],
row['customFields']['Effort'])
if __name__ == "__main__":
f = open('test.json')
json_file = json.load(f)
getCustomField(json_file)
JSON:
{
"data": {
"squads": [
{
"name": "TESTE",
"cards": [
{
"identifier": "0102",
"title": "TESTE",
"description": " TESTE ",
"status": "on_track",
"priority": null,
"assignees": [
{
"fullname": "TESTE",
"email": "TESTE"
}
],
"createdAt": "2020-04-16T15:00:31-03:00",
"secondaryLabel": null,
"primaryLabels": [
"TESTE",
"TESTE"
],
"swimlane": "TESTE",
"workstate": "Active",
"customFields": [
{
"name": "Tribe / Customer",
"value": "TESTE 1"
},
{
"name": "Checkpoint",
"value": "GNN"
},
{
"name": "Stopped with",
"value": null
},
{
"name": "Sub-Activity",
"value": "DEPLOY"
},
{
"name": "Activity",
"value": "TOOL"
},
{
"name": "Complexity",
"value": "HIGH"
},
{
"name": "Effort",
"value": "20"
}
]
},
{
"identifier": "0103",
"title": "TESTE",
"description": " TESTE ",
"status": "on_track",
"priority": null,
"assignees": [
{
"fullname": "TESTE",
"email": "TESTE"
}
],
"createdAt": "2020-04-16T15:00:31-03:00",
"secondaryLabel": null,
"primaryLabels": [
"TESTE",
"TESTE"
],
"swimlane": "TESTE",
"workstate": "Active",
"customFields": [
{
"name": "Tribe / Customer",
"value": "TESTE 1"
},
{
"name": "Stopped with",
"value": null
},
{
"name": "Checkpoint",
"value": "GNN"
},
{
"name": "Sub-Activity",
"value": "DEPLOY"
},
{
"name": "Activity",
"value": "TOOL"
},
{
"name": "Complexity",
"value": "HIGH"
},
{
"name": "Effort",
"value": "20"
}
]
}
]
}
]
}
}

You'll have to parse the list of custom fields into something you can access by name. Since you're accessing multiple entries from the same list, a dictionary is the most appropriate choice.
for row in ModelData["data"]["squads"][0]["cards"]:
custom_fields_dict = {field['name']: field['value'] for field in row['customFields']}
print(row['identifier'],
custom_fields_dict['Tribe / Customer'],
...
)
If you only wanted a single field you could traverse the list looking for a match, but it would be less efficient to do that repeatedly.
I'm skipping over dealing with missing fields - you'd probably want to use get('Tribe / Customer', some_reasonable_default) if there's any possibility of the field not being present in the json list.

how to convert multi valued CSV to Json

I have a csv file with 4 columns data as below.
type,MetalType,Date,Acknowledge
Metal,abc123451,2018-05-26,Success
Metal,abc123452,2018-05-27,Success
Metal,abc123454,2018-05-28,Failure
Iron,abc123455,2018-05-29,Success
Iron,abc123456,2018-05-30,Failure
( I just provided header in the above example data but in my case i dont have header in the data)
how can i convert above csv file to Json in the below format...
1st Column : belongs to --> "type": "Metal"
2nd Column : MetalType: "values" : "value": "abc123451"
3rd column : "Date": "values":"value": "2018-05-26"
4th Column : "Acknowledge": "values":"value": "Success"
and remaining all columns are default values.
As per below format ,
{
"entities": [
{
"id": "XXXXXXX",
"type": "Metal",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "abc123451"
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "2018-05-26"
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "Success"
}
]
}
}
}
}
]
}

Even though jww is right, I built something for you:
I import the csv using pandas:
df = pd.read_csv('data.csv')
then I create a template for the dictionaries you want to add:
d_json = {"entities": []}
template = {
"id": "XXXXXXX",
"type": "",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
}
}
}
}
Now you just need to fill in the dictionary:
for i in range(len(df)):
d = template
d['type'] = df['type'][i]
d['data']['attributes']['MetalType']['values'][0]['value'] = df['MetalType'][i]
d['data']['attributes']['Date']['values'][0]['value'] = df['Date'][i]
d['data']['attributes']['Acknowledge']['values'][0]['value'] = df['Acknowledge'][i]
d_json['entities'].append(d)
I know my way of iterating over the df is kind of ugly, maybe someone knows a cleaner way.
Cheers!

Flatten nested json to csv with nested column names

I have rather very weird requirement now. I have below json and somehow I have to convert it into flat csv.
[
{
"authorizationQualifier": "SDA",
"authorizationInformation": " ",
"securityQualifier": "ASD",
"securityInformation": " ",
"senderQualifier": "ASDAD",
"senderId": "FADA ",
"receiverQualifier": "ADSAS",
"receiverId": "ADAD ",
"date": "140101",
"time": "0730",
"standardsId": null,
"version": "00501",
"interchangeControlNumber": "123456789",
"acknowledgmentRequested": "0",
"testIndicator": "T",
"functionalGroups": [
{
"functionalIdentifierCode": "ADSAD",
"applicationSenderCode": "ASDAD",
"applicationReceiverCode": "ADSADS",
"date": "20140101",
"time": "07294900",
"groupControlNumber": "123456789",
"responsibleAgencyCode": "X",
"version": "005010X221A1",
"transactions": [
{
"name": "ASDADAD",
"transactionSetIdentifierCode": "adADS",
"transactionSetControlNumber": "123456789",
"implementationConventionReference": null,
"segments": [
{
"BPR03": "ad",
"BPR14": "QWQWDQ",
"BPR02": "1.57",
"BPR13": "23223",
"BPR01": "sad",
"BPR12": "56",
"BPR10": "32424",
"BPR09": "12313",
"BPR08": "DA",
"BPR07": "123456789",
"BPR06": "12313",
"BPR05": "ASDADSAD",
"BPR16": "21313",
"BPR04": "SDADSAS",
"BPR15": "11212",
"id": "aDSASD"
},
{
"TRN02": "2424",
"TRN03": "35435345",
"TRN01": "3435345",
"id": "FSDF"
},
{
"REF02": "fdsffs",
"REF01": "sfsfs",
"id": "fsfdsfd"
},
{
"DTM02": "2432424",
"id": "sfsfd",
"DTM01": "234243"
}
],
"loops": [
{
"id": "24324234234",
"segments": [
{
"N101": "sfsfsdf",
"N102": "sfsf",
"id": "dgfdgf"
},
{
"N301": "sfdssfdsfsf",
"N302": "effdssf",
"id": "fdssf"
},
{
"N401": "sdffssf",
"id": "sfds",
"N402": "sfdsf",
"N403": "23424"
},
{
"PER06": "Wsfsfdsfsf",
"PER05": "sfsf",
"PER04": "23424",
"PER03": "fdfbvcb",
"PER02": "Pedsdsf",
"PER01": "sfsfsf",
"id": "fdsdf"
}
]
},
{
"id": "2342",
"segments": [
{
"N101": "sdfsfds",
"N102": "vcbvcb",
"N103": "dsfsdfs",
"N104": "343443",
"id": "fdgfdg"
},
{
"N401": "dfsgdfg",
"id": "dfgdgdf",
"N402": "dgdgdg",
"N403": "234244"
},
{
"REF02": "23423342",
"REF01": "fsdfs",
"id": "sfdsfds"
}
]
}
]
}
]
}
]
}
]
The column header name corresponding to deeper key-value make take nested form, like functionalGroups[0].transactions[0].segments[0].BPR15.
I am able to do this in java using this github project (here you can find the output format I desire in the explanation) in one line:
flatJson = JSONFlattener.parseJson(new File("files/simple.json"), "UTF-8");
The output was:
date,securityQualifier,testIndicator,functionalGroups[1].functionalIdentifierCode,functionalGroups[1].date,functionalGroups[1].applicationReceiverCode, ...
140101,00,T,HP,20140101,ETIN,...
But I want to do this in python. I tried as suggested in this answer:
with open('data.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, record_prefix=True)
with open('temp2.csv', "w", newline='\n') as csv_file:
csv_file.write(df.to_csv())
However, for column functionalGroups, it dumps json as a cell value.
I also tried as suggested in this answer:
with open('data.json') as f: # this ensures opening and closing file
a = json.loads(f.read())
df = pandas.DataFrame(a)
print(df.transpose())
But this also seem to do the same:
0
acknowledgmentRequested 0
authorizationInformation
authorizationQualifier SDA
date 140101
functionalGroups [{'functionalIdentifierCode': 'ADSAD', 'applic...
interchangeControlNumber 123456789
receiverId ADAD
receiverQualifier ADSAS
securityInformation
securityQualifier ASD
senderId FADA
senderQualifier ASDAD
standardsId None
testIndicator T
time 0730
version 00501
Is it possible to do what I desire in python?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read JSON file to dataframe - python

Related

Using pandas to convert csv into nested json with dynamic strucutre

replace nested document array mongodb with python

How to read fields without numeric index in JSON

how to convert multi valued CSV to Json

Flatten nested json to csv with nested column names

Categories

Resources