Parse JSON to CSV + additional columns

Parse JSON to CSV + additional columns - python

I'm attempting to parse a JSON file with the following syntax into CSV:
{"code":2000,"message":"SUCCESS","data":
{"1":
{"id":1,
"name":"first_name",
"icon":"url.png",
"attribute1":"value",
"attribute2":"value" ...},
"2":
{"id":2,
"name":"first_name",
"icon":"url.png",
"attribute1":"value",
"attribute2":"value" ...},
"3":
{"id":3,
"name":"first_name",
"icon":"url.png",
"attribute1":"value",
"attribute2":"value" ...}, and so forth
}}}
I have found similar questions (e.g. here and here and I am working with the following method:
import requests
import json
import csv
import os
jsonfile = "/path/to.json"
csvfile = "/path/to.csv"
with open(jsonfile) as json_file:
data=json.load(json_file)
data_file = open(csvfile,'w')
csvwriter = csv.writer(data_file)
csvwriter.writerow(data["data"].keys())
for row in data:
csvwriter.writerow(row["data"].values())
data_file.close()
but I am missing something.
I get this error when I try to run:
TypeError: string indices must be integers
and my csv output is:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,96
At the end of the day, I am trying to convert the following function (from PowerShell) to Python. This converted the JSON to CSV and added 3 additional custom columns to the end:
$json = wget $lvl | ConvertFrom-Json
$json.data | %{$_.psobject.properties.value} `
| select-object *,#{Name='Custom1';Expression={$m}},#{Name='Level';Expression={$l}},#{Name='Custom2';Expression={$a}},#{Name='Custom3';Expression={$r}} `
| Export-CSV -path $outfile
The output looks like:
"id","name","icon","attribute1","attribute2",..."Custom1","Custom2","Custom3"
"1","first_name","url.png","value","value",..."a","b","c"
"2","first_name","url.png","value","value",..."a","b","c"
"3","first_name","url.png","value","value",..."a","b","c"

As suggested by martineau in a now-deleted answer, my key name was incorrect.
I ended up with this:
import json
import csv
jsonfile = "/path/to.json"
csvfile = "/path/to.csv"
with open(jsonfile) as json_file:
data=json.load(json_file)
data_file = open(csvfile,'w')
csvwriter = csv.writer(data_file)
#get sample keys
header=data["data"]["1"].keys()
#add new fields to dict
keys = list(header)
keys.append("field2")
keys.append("field3")
#write header
csvwriter.writerow(keys)
#for each entry
total = data["data"]
for row in total:
rowdefault = data["data"][str(row)].values()
rowdata = list(rowdefault)
rowdata.append("value1")
rowdata.append("value2")
csvwriter.writerow(rowdata)
Here, I'm grabbing each row by its name id via str(row).

Related

Python json to csv: "AttributeError: 'str' object has no attribute 'keys'"

So I have been attempting to create a file to convert json files to csv files using a template i found online. However, I keep receiving the following error:
header = User.keys()
AttributeError: 'str' object has no attribute 'keys'
Here is the code I'm using:
# Python program to convert
# JSON file to CSV
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('/Users/jhillier5/Downloads/mentalhealthcheck-in-default-rtdb-export (1).json') as json_file:
data = json.load(json_file)
User_data = data['User']
# now we will open a file for writing
data_file = open('data_file.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
print(User_data)
print(type(User_data))
for User in User_data:
if count == 0:
# Writing headers of CSV file
header = User.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(User.values())
data_file.close()
I simply don't understand why there is a string object and not a dict. I double checked and User_data is stored as a dict but i still get the error.
The json is organised in the following format:
{
"User" : {
"-MlIoVUgATqwtD_3AeCb" : {
"Aid" : "1",
"Gender" : "1",
"Grade" : "11",
}
}
}

In this for loop
for User in User_data:
if count == 0:
# Writing headers of CSV file
header = User.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(User.values())
You are iterating through User_data, so User will be ["MlIoVUgATqwtD_3AeCb", ...]. In order to get the keys of that user data you should index it this way -
header = User_data[User].keys()

Convert json file to csv

I have this simple python code that converts a json file into a csv.
I would like to convert only the first four values of each key, but i couldn't figure out how to do it.
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('personas.json') as json_file:
data = json.load(json_file)
employee_data = data['emp_details']
# now we will open a file for writing
data_file = open('data_file.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for emp in employee_data:
if count == 0:
# Writing headers of CSV file
header = emp.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(emp.values())
data_file.close()
Here is an example of json file's format
{"emp_details":[
{
"DATAID":"6908443",
"FIRST_NAME":"Fernando",
"SECOND_NAME":"Fabbiano",
"THIRD_NAME":"Agustin",
"FOURTH_NAME":"",
"AGE": "21",
"EMAIL": "fer.fab#gmail.com"
}
]}
And as i said, i would like to convert only the fields DATAID, FIRSTNAME, SECONDNAME, THIRD NAME.

How to convert bulk of csv into json

Below is the code to convert one csv file into json
List all the files in the dir
[data1.csv, data2.csv,data.csv]
Below is the code to convert one file to json. Like i need to loop all the files
import csv
import json
import os
import glob
os.chdir(r'dir' )
result = glob.glob( '*.csv' )
print (result)
def make_json(csvFilePath, jsonFilePath):
for i in result:
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['id']
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath =f"{i}"
jsonFilePath =f"{i.split('.')[-2]}.json"
make_json(csvFilePath, jsonFilePath)
I have similar kind of files which need to convert into json. in every files id is the primary key
Lot of files are there. i dont want to change the csvFilePath and jsonFilePath everytime.
The issue facing?
My all json files is ending with .csv.json
Last file name data9.csv is not converting to json and data2 is converting twice as data2.json and data2.csv.json

Here is an approach,
import glob
import pandas as pd
for csv_f in glob.glob('/<file_path>/*.csv'):
with open(f'{csv_f.replace(".csv", ".json")}', "w") as f:
pd.read_csv(csv_f).set_index('id') \
.to_json(f, orient='index')

Python create list of dictionaries from csv on S3

I am trying to take a CSV and create a list of dictionaries in python with the CSV coming from S3. Code is as follows:
import os
import boto3
import csv
import json
from io import StringIO
import logging
import time
s3 = boto3.resource('s3')
s3Client = boto3.client('s3','us-east-1')
bucket = 'some-bucket'
key = 'some-key'
obj = s3Client.get_object(Bucket = bucket, Key = key)
lines = obj['Body'].read().decode('utf-8').splitlines(True)
newl = []
for line in csv.reader(lines, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL,skipinitialspace=True, escapechar="\\"):
newl.append(line)
fieldnames = newl[0]
newl1 = newl[1:]
reader = csv.DictReader(newl1,fieldnames)
out = json.dumps([row for row in reader])
jlist1 = json.loads(out)
but this gives me the error:
iterator should return strings, not list (did you open the file in text mode?)
if I alter the for loop to this:
for line in csv.reader(lines, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL,skipinitialspace=True, escapechar="\\"):
newl.append(','.join(line))
then it works, however there are some fields that have commas in them so this completely screws up the schema and shifts the data. For example:
|address1 |address2 |state|
------------------------------
|123 Main st|APT 3, Fl1|TX |
becomes:
|address1 |address2 |state|null|
-----------------------------------
|123 Main st|APT 3 |Fl1 |TX |
Where am I going wrong?

The problem is that you are building a list of lists here :
newl.append(line)
and as the error says : iterator should return strings, not list
so try to cast line as a string:
newl.append(str(line))
Hope this helps :)

I ended up changing the code to this:
obj = s3Client.get_object(Bucket = bucket, Key = key)
lines1 = obj['Body'].read().decode('utf-8').split('\n')
fieldnames = lines1[0].replace('"','').split(',')
testls = [row for row in csv.DictReader(lines1[1:], fieldnames)]
out = json.dumps([row for row in testls])
jlist1 = json.loads(out)
And got the desired result

Python read JSON file and modify

Hi I am trying to take the data from a json file and insert and id then perform POST REST.
my file data.json has:
{
'name':'myname'
}
and I would like to add an id so that the json data looks like:
{
'id': 134,
'name': 'myname'
}
So I tried:
import json
f = open("data.json","r")
data = f.read()
jsonObj = json.loads(data)
I can't get to load the json format file.
What should I do so that I can convert the json file into json object and add another id value.

Set item using data['id'] = ....
import json
with open('data.json', 'r+') as f:
data = json.load(f)
data['id'] = 134 # <--- add `id` value.
f.seek(0) # <--- should reset file position to the beginning.
json.dump(data, f, indent=4)
f.truncate() # remove remaining part

falsetru's solution is nice, but has a little bug:
Suppose original 'id' length was larger than 5 characters. When we then dump with the new 'id' (134 with only 3 characters) the length of the string being written from position 0 in file is shorter than the original length. Extra chars (such as '}') left in file from the original content.
I solved that by replacing the original file.
import json
import os
filename = 'data.json'
with open(filename, 'r') as f:
data = json.load(f)
data['id'] = 134 # <--- add `id` value.
os.remove(filename)
with open(filename, 'w') as f:
json.dump(data, f, indent=4)

I would like to present a modified version of Vadim's solution. It helps to deal with asynchronous requests to write/modify json file. I know it wasn't a part of the original question but might be helpful for others.
In case of asynchronous file modification os.remove(filename) will raise FileNotFoundError if requests emerge frequently. To overcome this problem you can create temporary file with modified content and then rename it simultaneously replacing old version. This solution works fine both for synchronous and asynchronous cases.
import os, json, uuid
filename = 'data.json'
with open(filename, 'r') as f:
data = json.load(f)
data['id'] = 134 # <--- add `id` value.
# add, remove, modify content
# create randomly named temporary file to avoid
# interference with other thread/asynchronous request
tempfile = os.path.join(os.path.dirname(filename), str(uuid.uuid4()))
with open(tempfile, 'w') as f:
json.dump(data, f, indent=4)
# rename temporary file replacing old file
os.rename(tempfile, filename)

There is really quite a number of ways to do this and all of the above are in one way or another valid approaches... Let me add a straightforward proposition. So assuming your current existing json file looks is this....
{
"name":"myname"
}
And you want to bring in this new json content (adding key "id")
{
"id": "134",
"name": "myname"
}
My approach has always been to keep the code extremely readable with easily traceable logic. So first, we read the entire existing json file into memory, assuming you are very well aware of your json's existing key(s).
import json
# first, get the absolute path to json file
PATH_TO_JSON = 'data.json' # assuming same directory (but you can work your magic here with os.)
# read existing json to memory. you do this to preserve whatever existing data.
with open(PATH_TO_JSON,'r') as jsonfile:
json_content = json.load(jsonfile) # this is now in memory! you can use it outside 'open'
Next, we use the 'with open()' syntax again, with the 'w' option. 'w' is a write mode which lets us edit and write new information to the file. Here s the catch that works for us ::: any existing json with the same target write name will be erased automatically.
So what we can do now, is simply write to the same filename with the new data
# add the id key-value pair (rmbr that it already has the "name" key value)
json_content["id"] = "134"
with open(PATH_TO_JSON,'w') as jsonfile:
json.dump(json_content, jsonfile, indent=4) # you decide the indentation level
And there you go!
data.json should be good to go for an good old POST request

try this script:
with open("data.json") as f:
data = json.load(f)
data["id"] = 134
json.dump(data, open("data.json", "w"), indent = 4)
the result is:
{
"name":"mynamme",
"id":134
}
Just the arrangement is different, You can solve the problem by converting the "data" type to a list, then arranging it as you wish, then returning it and saving the file, like that:
index_add = 0
with open("data.json") as f:
data = json.load(f)
data_li = [[k, v] for k, v in data.items()]
data_li.insert(index_add, ["id", 134])
data = {data_li[i][0]:data_li[i][1] for i in range(0, len(data_li))}
json.dump(data, open("data.json", "w"), indent = 4)
the result is:
{
"id":134,
"name":"myname"
}
you can add if condition in order not to repeat the key, just change it, like that:
index_add = 0
n_k = "id"
n_v = 134
with open("data.json") as f:
data = json.load(f)
if n_k in data:
data[n_k] = n_v
else:
data_li = [[k, v] for k, v in data.items()]
data_li.insert(index_add, [n_k, n_v])
data = {data_li[i][0]:data_li[i][1] for i in range(0, len(data_li))}
json.dump(data, open("data.json", "w"), indent = 4)

This implementation should suffice:
with open(jsonfile, 'r') as file:
data = json.load(file)
data[id] = value
with open(jsonfile, 'w') as file:
json.dump(data, file)
using context manager for the opening of the jsonfile.
data holds the updated object and dumped into the overwritten jsonfile in 'w' mode.

Not exactly your solution but might help some people solving this issue with keys.
I have list of files in folder, and i need to make Jason out of it with keys.
After many hours of trying the solution is simple.
Solution:
async def return_file_names():
dir_list = os.listdir("./tmp/")
json_dict = {"responseObj":[{"Key": dir_list.index(value),"Value": value} for value in dir_list]}
print(json_dict)
return(json_dict)
Response look like this:
{
"responseObj": [
{
"Key": 0,
"Value": "bottom_mask.GBS"
},
{
"Key": 1,
"Value": "bottom_copper.GBL"
},
{
"Key": 2,
"Value": "copper.GTL"
},
{
"Key": 3,
"Value": "soldermask.GTS"
},
{
"Key": 4,
"Value": "ncdrill.DRD"
},
{
"Key": 5,
"Value": "silkscreen.GTO"
}
]
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse JSON to CSV + additional columns - python

Related

Python json to csv: "AttributeError: 'str' object has no attribute 'keys'"

Convert json file to csv

How to convert bulk of csv into json

Python create list of dictionaries from csv on S3

Python read JSON file and modify

Categories

Resources