I have following nested json file, which I need to convert in pandas dataframe, the main problem is there is only one unique item in the whole json and it is very deeply nested.
I tried to solve this problem with the following code, but it gives repeating output.
[{
"questions": [{
"key": "years-age",
"responseKey": null,
"responseText": "27",
"responseKeys": null
},
{
"key": "gender",
"responseKey": "male",
"responseText": null,
"responseKeys": null
}
],
"transactions": [{
"accId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"tId": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"catId": "21001000",
"tType": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"name": "Online Transfer FROM CHECKING 1200454623",
"category": [
"Transfer",
"Acc Transfer"
]
}
],
"institutions": [{
"InstName": "Citizens company",
"InstId": "inst_1",
"accounts": [{
"pAccId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"pAccType": "depo",
"pAccSubtype": "check",
"_id": "5ad38837e806efaa90da4849"
}]
}]
}]
I need to convert this to pandas dataframe as follows:
id pAccId tId
5ad38837e806efaa90da4849 v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ 80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53
The main problem I am facing is with the "id" as it very deeply nested which is the only unique key for the json.
here is my code:
import pandas as pd
import json
with open('sub.json') as f:
data = json.load(f)
csv = ''
for k in data:
for t in k.get("institutions"):
csv += k['institutions'][0]['accounts'][0]['_id']
csv += "\t"
csv += k['institutions'][0]['accounts'][0]['pAccId']
csv += "\t"
csv += k['transactions'][]['tId']
csv += "\t"
csv += "\n"
text_file = open("new_sub.csv", "w")
text_file.write(csv)
text_file.close()
Hope above code makes sense, as I am new to python.
Read the JSON file and create a dictionary of account pAccId keys mapped to account.
Build the list of transactions as well.
with open('sub.json', 'r') as file:
records = json.load(file)
accounts = {
account['pAccId']: account
for record in records
for institution in record['institutions']
for account in institution['accounts']
}
transactions = (
transaction
for record in records
for transaction in record['transactions']
)
Open a csv file. For each transaction, get account for it from the accounts dictionary.
with open('new_sub.csv', 'w') as file:
file.write('id, pAccId, tId\n')
for transaction in transactions:
pAccId = transaction['accId']
account = accounts[pAccId]
_id = account['_id']
tId = transaction['tId']
file.write(f"{_id}, {pAccId}, {tId}\n")
Finally, read csv file to pandas.DataFrame.
df = pd.read_csv('new_sub.csv')
Related
I have been trying to convert a JSON fie to CSV in python but the obtained csv is very vague with each letter being separated with comma rather than the word as a whole from the key - value pair. The code which I have tried and the obtained csv output are given below.
SAMPLE JSON FILE
"details":[
{
"name": "sreekumar, ananthu",
"type": "faculty/academician",
"personal": {
"age": "28",
"address": [
{
"street": "xyz",
"city": "abc",
}
]
}
SAMPLE CODE
import json
import csv
with open("json_data.json","r") as f:
data = json.loads(f)
csv_file = open("csv_file.csv","w")
csv_writer = csv.writer(csv_file)
for details in data['detail'];
for detail_key, detail_value in details.items():
if detail_key == 'name':
csv_writer.writerow(detail_value)
if detail_key == 'personal':
for personal_key, personal_value in detail_value.items():
if personal_key == 'age'
csv_writer.writerow(personal_value)
csv_file.close()
SAMPLE OUTPUT
s,r,e,e,k,u,m,a,ra,n,a,n,t,h,u,2,8
I currently have A JSON file saved containing some data I want to convert to CSV. Here is the data sample below, please note, I have censored the actual value in there for security and privacy reasons.
{
"ID value1": {
"Id": "ID value1",
"TechnischContactpersoon": {
"Naam": "Value",
"Telefoon": "Value",
"Email": "Value"
},
"Disclaimer": [
"Value"
],
"Voorzorgsmaatregelen": [
{
"Attributes": {},
"FileId": "value",
"FileName": "value",
"FilePackageLocation": "value"
},
{
"Attributes": {},
"FileId": "value",
"FileName": "value",
"FilePackageLocation": "value"
},
]
},
"ID value2": {
"Id": "id value2",
"TechnischContactpersoon": {
"Naam": "Value",
"Telefoon": "Value",
"Email": "Value"
},
"Disclaimer": [
"Placeholder"
],
"Voorzorgsmaatregelen": [
{
"Attributes": {},
"FileId": "value",
"FileName": "value",
"FilePackageLocation": "value"
}
]
},
Though I know how to do this (because I already have a function to handle a JSON to CSV convertion) with a simple JSON string without issues. I do not know to this with this kind of JSON file that this kind of a structure layer. Aka a second layer beneath the first. Also you may have noticed that there is an ID value above
Because as may have noticed from structure is actually another layer inside the JSON file. So in total I need to have two kinds of CSV files:
The main CSV file just containing the ID, Disclaimer. This CSV file
is called utility networks and contains all possible ID value's and
the value
A file containing the "Voorzorgsmaatregelen" value's. Because there are multiple values in this section, one CSV file per unique
ID file is needed and needs to be named after the Unique value id.
Deleted this part because it was irrelevant.
Data_folder = "Data"
Unazones_file_name = "UnaZones"
Utilitynetworks_file_name = "utilityNetworks"
folder_path_JSON_BS_JSON = folder_path_creation(Data_folder)
pkml_file_path = os.path.join(folder_path_JSON_BS_JSON,"pmkl.json")
print(pkml_file_path)
json_object = json_open(pkml_file_path)
json_content_unazones = json_object.get("mapRequest").get("UnaZones")
json_content_utility_Networks = json_object.get("utilityNetworks")
Unazones_json_location = json_to_save(json_content_unazones,folder_path_JSON_BS_JSON,Unazones_file_name)
csv_file_location_unazones = os.path.join(folder_path_CSV_file_path(Data_folder),(Unazones_file_name+".csv"))
csv_file_location_Utilitynetwork = os.path.join(folder_path_CSV_file_path(Data_folder),(Unazones_file_name+".csv"))
json_content_utility_Networks = json_object.get("utilityNetworks")
Utility_networks_json_location = json_to_save(json_content_utility_Networks,folder_path_JSON_BS_JSON,Utilitynetworks_file_name)
def json_to_csv_convertion(json_file_path: str, csv_file_location: str):
loaded_json_data = json_open(json_file_path)
# now we will open a file for writing
data_file = open(csv_file_location, 'w', newline='')
# # create the csv writer object
csv_writer = csv.writer(data_file,delimiter = ";")
# Counter variable used for writing
# headers to the CSV file
count = 0
for row in loaded_json_data:
if count == 0:
# Writing headers of CSV file
header = row.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(row.values())
data_file.close()
def folder_path_creation(path: str):
if not os.path.exists(path):
os.makedirs(path)
return path
def json_open(complete_folder_path):
with open(complete_folder_path) as f:
json_to_load = json.load(f) # Modified "objectids" to "object_ids" for readability -sg
return json_to_load
def json_to_save(input_json, folder_path: str, file_name: str):
json_save_location = save_file(input_json, folder_path, file_name, "json")
return json_save_location
So how do I this starting from this?
for obj in json_content_utility_Networks:
Go from there?
Keep in mind that is JSON value has already one layer above every object for every object I need to start one layer below it.
So how do I this?
I'm looking to pull the "name" field from a large json text file and be able to store them in another file for later, but I'm getting every piece of data that was in my previous json file albeit slightly modified. How do I make it so I only grab the data after the "name": field in my json file?
I've tried
names = []
with open('./out.json', 'r') as f:
data = json.load(f)
for name in data:
names.append(data[name])
with open('./names.json','w') as f:
for name in names:
f.write('%s\r\n' % name)
and I'm getting my exact json file back, with no formatting and u' in front of everything, likely from the json.load(f), but I have no idea how to remedy this.
my text file is formatted like this, if it matters:
{
"array":[
{
"name": "Seranul",
"id": 5,
"type": "Paladin",
"itemLevel": 414,
"icon": "Paladin-Holy",
"total": 11107150,
"activeTime": 2205387,
"activeTimeReduced": 2205387
},
{
"name": "Contherious",
"id": 9,
"type": "Hunter",
"itemLevel": 412,
"icon": "Hunter-Marksmanship",
"total": 51102811,
"activeTime": 2637303,
"activeTimeReduced": 2637303
},
{
"name": "Unicorns",
"id": 17,
"type": "Priest",
"itemLevel": null,
"icon": "Priest",
"total": 12252005,
"activeTime": 1768883,
"activeTimeReduced": 1761797
},
...
}
]}
I'm expecting to see the corresponding data for each name field, but I'm getting my entire document back.
It looks like your code is ignoring the structure of the JSON data. Specifically, you are iterating through the keys in the JSON dictionary, which is just array, and then appending the value to you names list. This results in the whole array property being put into your names variable.
Here is what I believe you want: iterate through the entries in array and and them to a list, then export that as JSON to another file.
import json
names = []
with open('./out.json', 'r') as f:
data = json.load(f)
for entry in data["array"]:
names.append(entry["name"])
with open('./names.json', 'w') as f:
f.write(json.dumps(names))
This will result in the following JSON in names.json:
["Seranul", "Contherious", "Unicorns"]
I'm having trouble to generate a well formatted CSV file out of some data i fetched from the leadfeeder API. In the csv file that is currently being created, not all values are in one row, id and leads are one column higher then the rest. Like here:
CSV Output
I later also like to load another json file and use it to map some values over the id and then put also the visits per lead into my csv file.
Do you also have some advice for this?
This is my code so far:
import json
import csv
csv_columns = ['name', 'industry', 'website_url', 'status', 'crm_lead_id', 'crm_organization_id', 'employee_count', 'id', 'type' ]
with open('data.json', 'r') as d:
d = json.load(d)
csv_file = 'lead_daten.csv'
try:
with open('leads.csv', 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns, extrasaction='ignore')
writer.writeheader()
for item in d['data']:
writer.writerow(item)
writer.writerow(item['attributes'])
except IOError:
print("I/O error")
My json data has the following structure:
I need also some of the nested values like the id in relationships!
{
"data": [
{
"attributes": {
"crm_lead_id": null,
"crm_organization_id": null,
"employee_count": 5000,
"facebook_url": null,
"first_visit_date": "2019-01-31",
"industry": "Furniture",
"last_visit_date": "2019-01-31",
"linkedin_url": null,
"name": "Example Inc",
"phone": null,
"status": "new",
"twitter_handle": "example",
"website_url": "http://www.example.com"
},
"id": "s7ybF6VxqhQqVM1m1BCnZT_8SRo9XnuoxSUP5ChvERZS9",
"relationships": {
"location": {
"data": {
"id": "8SRo9XnuoxSUP5ChvERZS9",
"type": "locations"
}
}
},
"type": "leads"
},
{
"attributes": {
"crm_lead_id": null,
When you write to a csv, you must write one full row at a time. You current code writes one row with only id and type, and then a different row with the other fields.
The correct way is to first fully build a dictionary containing all the fields and only then write it in one single operation. Code could be:
...
writer.writeheader()
for item in d['data']:
item.update(item["attributes"])
writer.writerow(item)
...
I'm dumping a MongoDB databse in Python in json format. Here's part of my code
cursor = collection.find()
with open(json_file_path, 'w') as outfile:
dump = json.dumps([doc for doc in cursor], sort_keys=False, indent=4, default=json_util.default)
outfile.write(dump)
The problem is that pymongo adds an _id filed by itself and creates an entry like "_id": {"$oid": "5c2b4813e43eda7815444204"}. This creates an error that key '$oid' must not start with '$' while loading from this json file. So I was thinking if I could either modify or skip this field all together while exporting the database itself? How can I do that?
{
"Employee ID": 9771504,
"NAME": "Harsh Wardhan",
"DOB": "14-Apr",
"MOBILE": 12345697890,
"Group": "SW-VS",
"_id": {
"$oid": "5c2b4813e43eda7815444204"
},
"Emai ID": "hwardhan#examples.com"
}
Assuming the extra id is added for each entry in the cursor, you can just filter it out before writing using a dict comprehension.
cursor = collection.find()
with open(json_file_path, 'w') as outfile:
dump = json.dumps([{k:v for k,v in doc.items() if k != "_id"} for doc in cursor],
sort_keys=False, indent=4, default=json_util.default)
outfile.write(dump)