I have a JSON with mostly simple key, value pairs. But some of thema are arrays. I want to convert the json to a simple csv format in order to import the values into tables of my postgresql database. The JSON file looks like this
{
"name": "Text",
"operator_type": [
"one",
"two",
"three"
],
"street": "M\u00f6nchhaldenstra\u00dfe ",
"street_nr": "113",
"zipcode": "70191",
"city": "Stuttgart",
"operator_type_id": [
"1",
"2",
"3"
],
"dc_operator_per": [
"100",
"",
""
],
"input_power": 600.0,
"el_power": 800.0,
"col_power": 300.0
}
To convert im using this simple method:
import pandas as pd
data=pd.read_json('export.json')
data.to_csv('text.csv')
data=pd.read_csv('text.csv')
But for each of my array elements it creates a new line in the csv with the same values if it is not an array. Like this:
name,dc_operator_type,capacity_kwh,export_me
Test,Colocation,,600,True
Test,,,600,True,
I want to have it like this:
name,dc_operator_type,capacity_kwh,export_me
Test,Colocation,,600,True
Or one object has two operator_types, then like this:
name,dc_operator_type,capacity_kwh,export_me
Test,Colocation,,600,True
,Seconlocation,,,
Related
I have a following excel file with two sheets:
and
I want to convert this excel into a json format using python that looks like this:
{
"app_id_c":"string",
"cust_id_n":"string",
"laa_app_a":"string",
"laa_promc":"string",
"laa_branch":"string",
"laa_app_type_o":"string",
"los_input_from_sas":[
"lsi_app_id_":'string',
"lsi_cust_type_c":'string'
]
}
I tried using in built JSON excel to json library but it is giving me series of json instead of nested and I can't utilise another sheet to be part of same JSON
First of all, you have to provide a minimal sample easy to copy and paste not an image of samples. But I have created a minimal sample similar to your images. It doesn't change the solution.
Read xlsx files and convert them to list of dictionaries in Python, then you will have objects like these:
sheet1 = [{
"app_id_c": "116092749",
"cust_id_n": "95014843",
"laa_app_a": "36",
"laa_promc": "504627",
"laa_branch": "8",
"laa_app_type_o": "C",
}]
sheet2 = [
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G",
},
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G",
},
]
After having the above mentioned objects in Python, you can create the desired json structure by the following script:
for i in sheet1:
i["los_input_from_sas"] = list()
for j in sheet2:
if i["app_id_c"] == j["lsi_app_id_"]:
i["los_input_from_sas"].append(j)
sheet1 = json.dumps(sheet1)
print(sheet1)
And this is the printed output:
[
{
"app_id_c": "116092749",
"cust_id_n": "95014843",
"laa_app_a": "36",
"laa_promc": "504627",
"laa_branch": "8",
"laa_app_type_o": "C",
"los_input_from_sas": [
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G"
},
{
"lsi_app_id_": "116092749",
"lsi_cust_type_c": "G"
}
]
}
]
UPDATE:
Here are some solution to read xlsx files and convert to python dict.
I am trying to convert an Excel to nested JSON using Python where the repeated values go in as an array of elements.
Ex: structure of CSV
Manufacturer,oilType,viscosity
shell,superOil,1ova
shell,superOil,2ova
shell,normalOil,1ova
bp, power, 10bba
Should be displayed in JSON (expected output) as
elements: [
{
"Manufacturer": "shell",
"details": [
{
"OilType": "superOil",
"Viscosity": [
"1ova",
"2ova"
]
},
{
"OilType": "normalOil",
"Viscosity": [
"1ova"
]
}
]
},
{
"Manufacturer": "bp",
"details": [
{
"OilType": "power",
"Viscosity": [
"10bba"
]
}
]
}
]
I have currently converted the CSV into JSON using openpyxl and the values are displayed for each of the headers in format like (Current output)
[{Manufacturer: "shell", oilType: "superOil", Viscosity:"1ova"},{...},{...},...]
Please help in getting the expected output.
Hi and welcome to StackOverflow.
Your question has actually nothing to do with openpyxl because you don't need to save into an Excel file.
You can do thought:
Load the csv (or Excel) into a pandas DataFrame
Group by Manufacturer and oil type
Dump into the format you want
Transform to JSON (either string or file)
In practice, that gives something like that:
import json
import pandas as pd
df = pd.read_csv("oil.csv") # or read_excel if this is an Excel
oils = df.groupby(["Manufacturer", "oilType"]).aggregate(pd.Series.to_list)
elements = [
{
"Manufacturer": manufacturer,
"Details": [
{"OilType": o, "Viscosity": v}
for o, v in data.droplevel(0).viscosity.items()
],
}
for manufacturer, data in oils.groupby(level="Manufacturer")
]
with open("oil.json", "w") as f:
json.dump({"elements": elements}, f)
For information, oils would look like this:
viscosity
Manufacturer oilType
bp power [10bba]
shell normalOil [1ova]
superOil [1ova, 2ova]
I have one json payload which is used for one service request. After processing that payload(JSON) will be stored in S3 and through Athena we can download those data in CSV format. Now in the actual scenario, there are more than 100 fields. I want to verify their value through some automated script instead of manual.
say my sample payload is similar to the following:
{
"BOOK": {
"serialno": "123",
"author": "xyz",
"yearofpublish": "2015",
"price": "16"
}, "Author": [
{
"isbn": "xxxxx", "title": "first", "publisher": "xyz", "year": "2020"
}, {
"isbn": "yyyy", "title": "second", "publisher": "zmy", "year": "2019"
}
]
}
the sample csv will be like following:
Can anyone please help me how exactly I can do it on Python? Maybe the library or dictionary?
it looks like you just want to flatten out the JSON structure. It'll be easiest to loop over the "Author" list. Since the CSV has renamed the columns you'll need some way to represent that mapping. Based only on example, this works:
import json
fin=open(some_json_file, 'r')
j=json.load(fin)
result=[]
for author in j['Author']:
val = {'book_serialno': j['BOOK']['serialno'],
'book_author': j['BOOK']['author'],
'book_yearofpublish': j['BOOK']['yearofpublish'],
'book_price': j['BOOK']['price'],
'author_isbn': author['isbn'],
'author_title': author['title'],
'author_publisher': author['publisher'],
'author_year': author['year']}
result.append(val)
This is using a dictionary to show the mapping of data points to the new column names. You might be able to get away with using a list as well. Depends how you want to use it later on. To write to a CSV:
import csv
fout=open(some_csv_file, 'w')
writer=csv.writer(fout)
writer.writerow(result[0].keys())
writer.writerows(r.values() for r in result)
This writes the column names in the first row, then the data. If you don't want the column names, just leave out the writerow(...) line.
I have a data structure like this:
data = [{
"name": "leopard",
"character": "mean",
"skills": ["sprinting", "hiding"],
"pattern": "striped",
},
{
"name": "antilope",
"character": "good",
"skills": ["running"],
},
.
.
.
]
Each key in the dictionaries has values of type integer, string or
list of strings (not all keys are in all dicts present), each
dictionary represents a row in a table; all rows are given as the list
of dictionaries.
How can I easily import this into Pandas? I tried
df = pd.DataFrame.from_records(data)
but here I get an "ValueError: arrays must all be same length" error.
The DataFrame constructor takes row-based arrays (amoungst other structures) as data input. Therefore the following works:
data = [{
"name": "leopard",
"character": "mean",
"skills": ["sprinting", "hiding"],
"pattern": "striped",
},
{
"name": "antilope",
"character": "good",
"skills": ["running"],
}]
df = pd.DataFrame(data)
print(df)
Output:
character name pattern skills
0 mean leopard striped [sprinting, hiding]
1 good antilope NaN [running]
I would like to convert a csv file into a json file using python 2.7. Down below is the python code I tried but it is not giving me expected result. Also, I would like to know if there is any simplified version than mine. Any help is appreciated.
Here is my csv file (SampleCsvFile.csv):
zipcode,date,state,val1,val2,val3,val4,val5
95110,2015-05-01,CA,50,30.00,5.00,3.00,3
95110,2015-06-01,CA,67,31.00,5.00,3.00,4
95110,2015-07-01,CA,97,32.00,5.00,3.00,6
Here is the expected json file (ExpectedJsonFile.json):
{
"zipcode": "95110",
"state": "CA",
"subset": [
{
"date": "2015-05-01",
"val1": "50",
"val2": "30.00",
"val3": "5.00",
"val4": "3.00",
"val5": "3"
},
{
"date": "2015-06-01",
"val1": "67",
"val2": "31.00",
"val3": "5.00",
"val4": "3.00",
"val5": "4"
},
{
"date": "2015-07-01",
"val1": "97",
"val2": "32.00",
"val3": "5.00",
"val4": "3.00",
"val5": "6"
}
]
}
Here's the python code I tried:
import pandas as pd
from itertools import groupby
import json
df = pd.read_csv('SampleCsvFile.csv')
names = df.columns.values.tolist()
data = df.values
master_list2 = [ (d["zipcode"], d["state"], d) for d in [dict(zip(names, d)) for d in data] ]
intermediate2 = [(k, [x[2] for x in list(v)]) for k,v in groupby(master_list2, lambda t: (t[0],t[1]) )]
nested_json2 = [dict(zip(names,(k[0][0], k[0][1], k[1]))) for k in [(i[0], i[1]) for i in intermediate2]]
#print json.dumps(nested_json2, indent=4)
with open('ExpectedJsonFile.json', 'w') as outfile:
outfile.write(json.dumps(nested_json2, indent=4))
Since you are using pandas already, I tried to get as much mileage as I could out of dataframe methods. I also ended up wandering fairly far afield from your implementation. I think the key here, though, is don't try to get too clever with your list and/or dictionary comprehensions. You can very easily confuse yourself and everyone who reads your code.
import pandas as pd
from itertools import groupby
from collections import OrderedDict
import json
df = pd.read_csv('SampleCsvFile.csv', dtype={
"zipcode" : str,
"date" : str,
"state" : str,
"val1" : str,
"val2" : str,
"val3" : str,
"val4" : str,
"val5" : str
})
results = []
for (zipcode, state), bag in df.groupby(["zipcode", "state"]):
contents_df = bag.drop(["zipcode", "state"], axis=1)
subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
results.append(OrderedDict([("zipcode", zipcode),
("state", state),
("subset", subset)]))
print json.dumps(results[0], indent=4)
#with open('ExpectedJsonFile.json', 'w') as outfile:
# outfile.write(json.dumps(results[0], indent=4))
The simplest way to have all the json datatypes written as strings, and to retain their original formatting, was to force read_csv to parse them as strings. If, however, you need to do any numerical manipulation on the values before writing out the json, you will have to allow read_csv to parse them numerically and coerce them into the proper string format before converting to json.