I have a JSON File (>1GB) and I have another CSV File with one matching column (i.e ID). I need to update the JSON File by mapping CSV with JSON.
The approach I thought at first was to convert the json to csv and then overwrite the csv, but since the file is huge, it's not the most optimized way. I am supposed to use Python.
import csv
import json
id = []
qrank = []
def readingCsvFile():
with open('qrank.csv', 'r') as csvFile:
dataCsv = csv.reader(csvFile)
for row in dataCsv:
id.append(row[0])
qrank.append(row[1])
dataJson = [json.loads(line) for line in open('enhanced-wikipois','r', encoding='UTF-8')]
records = len(dataJson)
readingCsvFile()
for i in range(records):
x = dataJson[i]['id']
if (x in id):
pos = id.index(x)
dataJson[i]['wikiQRank'] = qrank[pos]
print(dataJson)
The size of the file is not really relevant. What's important is the number of JSON objects and the number of "qrank" values.
If you build a dictionary based on id and rank from the CSV file then the subsequent lookups will be much more efficient.
There are a number of other efficiencies that you could implement.
import csv
import json
CSVFILE = '/Volumes/G-Drive/qrank.csv'
JSONLFILE = '/Volumes/G-Drive/enhanced-wikipois'
def read_csv(filename):
with open(filename, newline='') as data:
reader = csv.reader(data)
return {_id: rank for _id, rank, *_ in reader}
def read_jsonl(filename):
with open(filename) as data:
return [json.loads(line) for line in data]
id_dict = read_csv(CSVFILE)
json_data = read_jsonl(JSONLFILE)
for j in json_data:
if (_id := j.get('id')) is not None:
if (rank := id_dict.get(_id)) is not None:
j['wikiQRank'] = rank
print(json_data)
I have problem with convert json file to csv file on python
and i think it will be the nested json file but i don't know how to handle it!
import json, requests
url = requests.get("https://####/api/food_orders")
text = url.text
data = json.load(text)
order_data = data['data']
# now we will open a file for writing
data_file = open('ordersJsonToCsv.csv', 'w', newline='')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for ord in order_data:
if count == 0:
# Writing headers of CSV file
header = ord.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(ord.values())
data_file.close()
And Json file look like
This code will solve the problem to get data only
import pandas as pd
import json, requests
url = requests.get("https://##/api/orders?
text = url.text
info = json.loads(text)
df = pd.json_normalize(info['data'])
df.to_csv("samplecsv.csv")
So I have been attempting to create a file to convert json files to csv files using a template i found online. However, I keep receiving the following error:
header = User.keys()
AttributeError: 'str' object has no attribute 'keys'
Here is the code I'm using:
# Python program to convert
# JSON file to CSV
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('/Users/jhillier5/Downloads/mentalhealthcheck-in-default-rtdb-export (1).json') as json_file:
data = json.load(json_file)
User_data = data['User']
# now we will open a file for writing
data_file = open('data_file.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
print(User_data)
print(type(User_data))
for User in User_data:
if count == 0:
# Writing headers of CSV file
header = User.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(User.values())
data_file.close()
I simply don't understand why there is a string object and not a dict. I double checked and User_data is stored as a dict but i still get the error.
The json is organised in the following format:
{
"User" : {
"-MlIoVUgATqwtD_3AeCb" : {
"Aid" : "1",
"Gender" : "1",
"Grade" : "11",
}
}
}
In this for loop
for User in User_data:
if count == 0:
# Writing headers of CSV file
header = User.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(User.values())
You are iterating through User_data, so User will be ["MlIoVUgATqwtD_3AeCb", ...]. In order to get the keys of that user data you should index it this way -
header = User_data[User].keys()
How to create a null json file and append each details to the json file in the following format
[
{"name":"alan","job":"clerk"},
{"name":"bob","job":"engineer"}
]
Code
import json
with open("test.json", mode='w', encoding='utf-8') as f:
json.dump([], f)
test_data = ['{"name":"alan","job":"clerk"}','{"name":"bob","job":"engineer"}']
for i in test_data:
with open("test.json", mode='w', encoding='utf-8') as fileobj:
json.dump(i, fileobj)
How this can be efficiently done
You can't modify the json content like that. You'll need to modify the data structure and then completely rewrite the json file. You might be able to just read the data from jsone at startup, and write it at shutdown.
import json
def store_my_data(data, filename='test.json'):
""" write data to json file """
with open(filename, mode='w', encoding='utf-8') as f:
json.dump(data, f)
def load_my_data(filename='test.json'):
""" load data from json file """
with open(filename, mode='r', encoding='utf-8') as f:
return json.load(f)
raise Exception # skipping some steps here
test_data = [
{"name": "alan", "job": "clerk"},
{"name": "bob", "job": "engineer"}
]
item_one = test_data[0]
item_two = test_data[1]
# You already know how to store data in a json file.
store_my_data(test_data)
# Suppose you don't have any data at the start.
current_data = []
store_my_data(current_data)
# Later, you want to add to the data.
# You will have to change your data in memory,
# then completely rewrite the file.
current_data.append(item_one)
current_data.append(item_two)
store_my_data(current_data)
I'm attempting to parse a JSON file with the following syntax into CSV:
{"code":2000,"message":"SUCCESS","data":
{"1":
{"id":1,
"name":"first_name",
"icon":"url.png",
"attribute1":"value",
"attribute2":"value" ...},
"2":
{"id":2,
"name":"first_name",
"icon":"url.png",
"attribute1":"value",
"attribute2":"value" ...},
"3":
{"id":3,
"name":"first_name",
"icon":"url.png",
"attribute1":"value",
"attribute2":"value" ...}, and so forth
}}}
I have found similar questions (e.g. here and here and I am working with the following method:
import requests
import json
import csv
import os
jsonfile = "/path/to.json"
csvfile = "/path/to.csv"
with open(jsonfile) as json_file:
data=json.load(json_file)
data_file = open(csvfile,'w')
csvwriter = csv.writer(data_file)
csvwriter.writerow(data["data"].keys())
for row in data:
csvwriter.writerow(row["data"].values())
data_file.close()
but I am missing something.
I get this error when I try to run:
TypeError: string indices must be integers
and my csv output is:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,96
At the end of the day, I am trying to convert the following function (from PowerShell) to Python. This converted the JSON to CSV and added 3 additional custom columns to the end:
$json = wget $lvl | ConvertFrom-Json
$json.data | %{$_.psobject.properties.value} `
| select-object *,#{Name='Custom1';Expression={$m}},#{Name='Level';Expression={$l}},#{Name='Custom2';Expression={$a}},#{Name='Custom3';Expression={$r}} `
| Export-CSV -path $outfile
The output looks like:
"id","name","icon","attribute1","attribute2",..."Custom1","Custom2","Custom3"
"1","first_name","url.png","value","value",..."a","b","c"
"2","first_name","url.png","value","value",..."a","b","c"
"3","first_name","url.png","value","value",..."a","b","c"
As suggested by martineau in a now-deleted answer, my key name was incorrect.
I ended up with this:
import json
import csv
jsonfile = "/path/to.json"
csvfile = "/path/to.csv"
with open(jsonfile) as json_file:
data=json.load(json_file)
data_file = open(csvfile,'w')
csvwriter = csv.writer(data_file)
#get sample keys
header=data["data"]["1"].keys()
#add new fields to dict
keys = list(header)
keys.append("field2")
keys.append("field3")
#write header
csvwriter.writerow(keys)
#for each entry
total = data["data"]
for row in total:
rowdefault = data["data"][str(row)].values()
rowdata = list(rowdefault)
rowdata.append("value1")
rowdata.append("value2")
csvwriter.writerow(rowdata)
Here, I'm grabbing each row by its name id via str(row).