I have a following excel file with two sheets:
and
I want to convert this excel into a json format using python that looks like this:
{
"app_id_c":"string",
"cust_id_n":"string",
"laa_app_a":"string",
"laa_promc":"string",
"laa_branch":"string",
"laa_app_type_o":"string",
"los_input_from_sas":[
"lsi_app_id_":'string',
"lsi_cust_type_c":'string'
]
}
I tried using in built JSON excel to json library but it is giving me series of json instead of nested and I can't utilise another sheet to be part of same JSON
First of all, you have to provide a minimal sample easy to copy and paste not an image of samples. But I have created a minimal sample similar to your images. It doesn't change the solution.
Read xlsx files and convert them to list of dictionaries in Python, then you will have objects like these:
sheet1 = [{
"app_id_c": "116092749",
"cust_id_n": "95014843",
"laa_app_a": "36",
"laa_promc": "504627",
"laa_branch": "8",
"laa_app_type_o": "C",
}]
sheet2 = [
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G",
},
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G",
},
]
After having the above mentioned objects in Python, you can create the desired json structure by the following script:
for i in sheet1:
i["los_input_from_sas"] = list()
for j in sheet2:
if i["app_id_c"] == j["lsi_app_id_"]:
i["los_input_from_sas"].append(j)
sheet1 = json.dumps(sheet1)
print(sheet1)
And this is the printed output:
[
{
"app_id_c": "116092749",
"cust_id_n": "95014843",
"laa_app_a": "36",
"laa_promc": "504627",
"laa_branch": "8",
"laa_app_type_o": "C",
"los_input_from_sas": [
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G"
},
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G"
}
]
}
]
UPDATE:
Here are some solution to read xlsx files and convert to python dict.
Related
I have a JSON with mostly simple key, value pairs. But some of thema are arrays. I want to convert the json to a simple csv format in order to import the values into tables of my postgresql database. The JSON file looks like this
{
"name": "Text",
"operator_type": [
"one",
"two",
"three"
],
"street": "M\u00f6nchhaldenstra\u00dfe ",
"street_nr": "113",
"zipcode": "70191",
"city": "Stuttgart",
"operator_type_id": [
"1",
"2",
"3"
],
"dc_operator_per": [
"100",
"",
""
],
"input_power": 600.0,
"el_power": 800.0,
"col_power": 300.0
}
To convert im using this simple method:
import pandas as pd
data=pd.read_json('export.json')
data.to_csv('text.csv')
data=pd.read_csv('text.csv')
But for each of my array elements it creates a new line in the csv with the same values if it is not an array. Like this:
name,dc_operator_type,capacity_kwh,export_me
Test,Colocation,,600,True
Test,,,600,True,
I want to have it like this:
name,dc_operator_type,capacity_kwh,export_me
Test,Colocation,,600,True
Or one object has two operator_types, then like this:
name,dc_operator_type,capacity_kwh,export_me
Test,Colocation,,600,True
,Seconlocation,,,
I am trying to convert an Excel to nested JSON using Python where the repeated values go in as an array of elements.
Ex: structure of CSV
Manufacturer,oilType,viscosity
shell,superOil,1ova
shell,superOil,2ova
shell,normalOil,1ova
bp, power, 10bba
Should be displayed in JSON (expected output) as
elements: [
{
"Manufacturer": "shell",
"details": [
{
"OilType": "superOil",
"Viscosity": [
"1ova",
"2ova"
]
},
{
"OilType": "normalOil",
"Viscosity": [
"1ova"
]
}
]
},
{
"Manufacturer": "bp",
"details": [
{
"OilType": "power",
"Viscosity": [
"10bba"
]
}
]
}
]
I have currently converted the CSV into JSON using openpyxl and the values are displayed for each of the headers in format like (Current output)
[{Manufacturer: "shell", oilType: "superOil", Viscosity:"1ova"},{...},{...},...]
Please help in getting the expected output.
Hi and welcome to StackOverflow.
Your question has actually nothing to do with openpyxl because you don't need to save into an Excel file.
You can do thought:
Load the csv (or Excel) into a pandas DataFrame
Group by Manufacturer and oil type
Dump into the format you want
Transform to JSON (either string or file)
In practice, that gives something like that:
import json
import pandas as pd
df = pd.read_csv("oil.csv") # or read_excel if this is an Excel
oils = df.groupby(["Manufacturer", "oilType"]).aggregate(pd.Series.to_list)
elements = [
{
"Manufacturer": manufacturer,
"Details": [
{"OilType": o, "Viscosity": v}
for o, v in data.droplevel(0).viscosity.items()
],
}
for manufacturer, data in oils.groupby(level="Manufacturer")
]
with open("oil.json", "w") as f:
json.dump({"elements": elements}, f)
For information, oils would look like this:
viscosity
Manufacturer oilType
bp power [10bba]
shell normalOil [1ova]
superOil [1ova, 2ova]
I want to convert csv file to json file.
I have large data in csv file.
CSV Column Structure
This is my column structure in csv file . I has 200+ records.
id.oid libId personalinfo.Name personalinfo.Roll_NO personalinfo.addr personalinfo.marks.maths personalinfo.marks.physic clginfo.clgName clginfo.clgAddr clginfo.haveCert clginfo.certNo clginfo.certificates.cert_name_1 clginfo.certificates.cert_no_1 clginfo.certificates.cert_exp_1 clginfo.certificates.cert_name_2 clginfo.certificates.cert_no_2 clginfo.certificates.cert_exp_2 clginfo.isDept clginfo.NoofDept clginfo.DeptDetails.DeptName_1 clginfo.DeptDetails.location_1 clginfo.DeptDetails.establish_date_1 _v updatedAt.date
Expected Json
[{
"id":
{
"$oid": "00001"
},
"libId":11111,
"personalinfo":
{
"Name":"xyz",
"Roll_NO":101,
"addr":"aa bb cc ddd",
"marks":
[
"maths":80,
"physic":90
.....
]
},
"clginfo"
{
"clgName":"pqr",
"clgAddr":"qwerty",
"haveCert":true, //this is boolean true or false
"certNo":1, //this could be 1-10
"certificates":
[
{
"cert_name_1":"xxx",
"cert_no_1":12345,
"cert_exp.1":"20/2/20202"
},
{
"cert_name_2":"xxx",
"cert_no_2":12345,
"cert_exp_2":"20/2/20202"
},
......//could be up to 10
],
"isDept":true, //this is boolean true or false
"NoofDept":1 , //this could be 1-10
"DeptDetails":
[
{
"DeptName_1":"yyy",
"location_1":"zzz",
"establish_date_1":"1/1/1919"
},
......//up to 10 records
]
},
"__v": 1,
"updatedAt":
{
"$date": "2022-02-02T13:35:59.843Z"
}
}]
I have tried using pandas but I'm getting output as
My output
[{
"id.$oid": "00001",
"libId":11111,
"personalinfo.Name":"xyz",
"personalinfo.Roll_NO":101,
"personalinfo.addr":"aa bb cc ddd",
"personalinfo.marks.maths":80,
"personalinfo.marks.physic":90,
"clginfo.clgName":"pqr",
"clginfo.clgAddr":"qwerty",
"clginfo.haveCert":true,
"clginfo.certNo":1,
"clginfo.certificates.cert_name_1":"xxx",
"clginfo.certificates.cert_no_1":12345,
"clginfo.certificates.cert_exp.1":"20/2/20202"
"clginfo.certificates.cert_name_2":"xxx",
"clginfo.certificates.cert_no_2":12345,
"clginfo.certificates.cert_exp_2":"20/2/20202"
"clginfo.isDept":true,
"clginfo.NoofDept":1 ,
"clginfo.DeptDetails.DeptName_1":"yyy",
"clginfo.DeptDetails.location_1":"zzz",
"eclginfo.DeptDetails.stablish_date_1":"1/1/1919",
"__v": 1,
"updatedAt.$date": "2022-02-02T13:35:59.843Z",
}]
I am new to python I only know the basic Please help me getting this output.
200+ records is really tiny, so even naive solution is good.
It can't be totally generic because I don't see how it can be seen from the headers that certificates is a list, unless we rely on all names under certificates having _N at the end.
Proposed solution using only basic python:
read header row - split all column names by period. Iterate over resulting list and create nested dicts with appropriate keys and dummy values (if you want to handle lists: create array if current key ends with _N and use N as an index)
for all rows:
clone dictionary with dummy values
for each column use split keys from above to put the value into the corresponding dict. same solution from above for lists.
append the dictionary to list of rows
I used to have a simple row and column format in table and was reading it by pandassql .
but if you have structure like below and want to get age>10 from this , how do I get it using pandassql?
[ {
"response":{
"version":"1.1",
"token":"dsfgf",
"body":{
"customer":{
"customer_id":"1234567",
"verified":"true"
},
"contact":{
"email":"mr#abc.com",
"mobile_number":"0123456789"
},
"personal":{
"gender": "m",
"title":"Dr.",
"last_name":"Muster",
"first_name":"Max",
"family_status":"single",
"dob":"1985-12-23",
}
}
} ]
This is answered here:
Pandas read nested json
You can use json_normalize
There is also a full description of the dame issue here:
https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e
I have one json payload which is used for one service request. After processing that payload(JSON) will be stored in S3 and through Athena we can download those data in CSV format. Now in the actual scenario, there are more than 100 fields. I want to verify their value through some automated script instead of manual.
say my sample payload is similar to the following:
{
"BOOK": {
"serialno": "123",
"author": "xyz",
"yearofpublish": "2015",
"price": "16"
}, "Author": [
{
"isbn": "xxxxx", "title": "first", "publisher": "xyz", "year": "2020"
}, {
"isbn": "yyyy", "title": "second", "publisher": "zmy", "year": "2019"
}
]
}
the sample csv will be like following:
Can anyone please help me how exactly I can do it on Python? Maybe the library or dictionary?
it looks like you just want to flatten out the JSON structure. It'll be easiest to loop over the "Author" list. Since the CSV has renamed the columns you'll need some way to represent that mapping. Based only on example, this works:
import json
fin=open(some_json_file, 'r')
j=json.load(fin)
result=[]
for author in j['Author']:
val = {'book_serialno': j['BOOK']['serialno'],
'book_author': j['BOOK']['author'],
'book_yearofpublish': j['BOOK']['yearofpublish'],
'book_price': j['BOOK']['price'],
'author_isbn': author['isbn'],
'author_title': author['title'],
'author_publisher': author['publisher'],
'author_year': author['year']}
result.append(val)
This is using a dictionary to show the mapping of data points to the new column names. You might be able to get away with using a list as well. Depends how you want to use it later on. To write to a CSV:
import csv
fout=open(some_csv_file, 'w')
writer=csv.writer(fout)
writer.writerow(result[0].keys())
writer.writerows(r.values() for r in result)
This writes the column names in the first row, then the data. If you don't want the column names, just leave out the writerow(...) line.