CSV to JSON with Specific format - python

I Have csv file with this data and using python i would like to convert in json Format.
I would like to convert in this format Json Format.Can you tell me the which library i should use or any suggestion for sudo code.
I am able to convert in standard json which has key value pair but i don't know how to convert below Json Format.
"T-shirt","Long-tshirt",18
"T-shirt","short-tshirt"19
"T-shirt","half-tshirt",20
"top","very-nice",45
"top","not-nice",56
{
"T-shirts":[
{
"name":"Long-tshirt",
"size":"18"
},
{
"name":"short-tshirt",
"size":"19"
},
{
"name":"half-tshirt",
"size":"20"
},
],
"top":[
{
"name":"very-nice"
"size":45
},
{
"name":"not-nice"
"size":45
},
]
}

In this code, I put your CSV into test.csv file: (as a heads up, the provided code was missing a comma before the 19).
"T-shirt","Long-tshirt",18
"T-shirt","short-tshirt",19
"T-shirt","half-tshirt",20
"top","very-nice",45
"top","not-nice",56
Then, using the built-in csv and json modules you can iterate over each row and add them to a dictionary. I used a defaultdict to save time, and write out that data to a json file.
import csv, json
from collections import defaultdict
my_data = defaultdict(list)
with open("test.csv") as csv_file:
reader = csv.reader(csv_file)
for row in reader:
if row: # To ignore blank lines
my_data[row[0]].append({"name": row[1], "size": row[2]})
with open("out.json", "w") as out_file:
json.dump(my_data, out_file, indent=2)
Generated out file:
{
"T-shirt": [
{
"name": "Long-tshirt",
"size": "18"
},
{
"name": "short-tshirt",
"size": "19"
},
{
"name": "half-tshirt",
"size": "20"
}
],
"top": [
{
"name": "very-nice",
"size": "45"
},
{
"name": "not-nice",
"size": "56"
}
]
}

import json
json_string = json.dumps(your_dict)
You now have a string containing json formatted date from your original dictionary - is that what you wanted?

Related

Convert unformatted JSON file to CSV

I am trying to convert a JSON file into CSV. The issue is JSON file is not formatted uniformly.
{
"attributes": {
"type": "Lead",
"url": "xyz"
},
"FirstName": "Bradford",
"LastName": "Cosenza",
"School_District__c": "Ross County",
"Status": "Open",
"CreatedDate": "2022-12-21T16:34:35.000+0000",
"Email": "something#something.com",
"Lead_ID__c": "00Q3b0000212gxh",
"Id": "00Q3b0000212gxhEAA"
},
{
"attributes": {
"type": "Lead",
"url": "xyz"
},
"FirstName": "Bradford",
"LastName": "Cosenza",
"School_District__c": "Ross County",
"Status": "Open",
"CreatedDate": "2020-03-31T23:25:03.000+0000",
"Verification_Status__c": "Invalid",
"Verification_Date__c": "2022-08-05",
"Email": "something#something.com",
"Lead_ID__c": "00Q3b00001t0uNf",
"Id": "00Q3b00001t0uNfEAI"
},
Here is the snippet from the JSON file, but Verification_Status__c,Verification_Date__c is missing from the 2nd entry.
I used this code
import json
import csv
# Open the JSON file & load its data
with open('duplicate.json') as dat_file:
data = json.load(dat_file)
stud_data = data['records']
# Opening a CSV file for writing in write mode
data_file = open('data_file.csv', 'w')
csv_writer = csv.writer(data_file)
count = 0
for cnt in stud_data:
if count == 0:
header = cnt.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(cnt.values())
data_file.close()
but I am getting scrambled data in CSV file
Can use a csv.DictWriter if the records appear in different order or if the keys are missing from some records.
If there are nested objects in the JSON then they need to be flattened to export in the CSV output.
For the DictWriter, you will need the full set of keys when creating it, so can either create a fixed list at start or do two-passes over the data where the first pass will find the full set of keys and second pass creates the CSV file.
import json
import csv
data = """{
"records":[
{
"attributes": {
"type": "Lead",
"url": "xyz"
},
"FirstName": "Bradford",
"LastName": "Cosenza",
"School_District__c": "Ross County",
"Status": "Open",
"CreatedDate": "2022-12-21T16:34:35.000+0000",
"Email": "something#something.com",
"Lead_ID__c": "00Q3b0000212gxh",
"Id": "00Q3b0000212gxhEAA"
},
{
"attributes": {
"type": "Lead",
"url": "xyz"
},
"FirstName": "Bradford",
"LastName": "Cosenza",
"School_District__c": "Ross County",
"Status": "Open",
"CreatedDate": "2020-03-31T23:25:03.000+0000",
"Verification_Status__c": "Invalid",
"Verification_Date__c": "2022-08-05",
"Email": "something#something.com",
"Lead_ID__c": "00Q3b00001t0uNf",
"Id": "00Q3b00001t0uNfEAI"
}
]}"""
# full set of keys in JSON for the CSV columns
keys = ["Id",
"FirstName",
"LastName",
"School_District__c",
"Status",
"CreatedDate",
"Verification_Status__c",
"Verification_Date__c",
"Email",
"Lead_ID__c",
"type",
"url"
]
Next convert data to list of dictionary objects
and write output to CSV file.
# Open the JSON file & load its data
# use json.loads() to load from string or json.load() to load from file
data = json.loads(data)
stud_data = data['records']
# Opening a CSV file for writing in write mode
with open('data_file.csv', 'w', newline='') as data_file:
csv_writer = csv.DictWriter(data_file, fieldnames=keys)
csv_writer.writeheader()
for row in stud_data:
# flatten the sub-elements in attributes object
attrs = row.pop("attributes", None)
if attrs:
for k,v in attrs.items():
row[k] = v
csv_writer.writerow(row)
Output:
Id,FirstName,LastName,School_District__c,Status,CreatedDate,Verification_Status__c,Verification_Date__c,Email,Lead_ID__c,type,url
00Q3b0000212gxhEAA,Bradford,Cosenza,Ross County,Open,2022-12-21T16:34:35.000+0000,,,something#something.com,00Q3b0000212gxh,Lead,xyz
00Q3b00001t0uNfEAI,Bradford,Cosenza,Ross County,Open,2020-03-31T23:25:03.000+0000,Invalid,2022-08-05,something#something.com,00Q3b00001t0uNf,Lead,xyz

Python - normalizing nested json file

I have nested json file and i am trying to get the data into data frame. I have to extract sensor-time, then elements and finally sensor info.
Here is how the json file looks like:
{
"sensor-time": {
"timezone": "America/Los_Angeles",
"time": "2019-11-21T01:00:04-08:00"
},
"status": {
"code": "OK"
},
"content": {
"element": [{
"element-id": 0,
"element-name": "Line 0",
"sensor-type": "SINGLE_SENSOR",
"data-type": "LINE",
"from": "2019-11-21T00:00:00-08:00",
"to": "2019-11-21T01:00:00-08:00",
"resolution": "ONE_HOUR",
"measurement": [{
"from": "2019-11-21T00:00:00-08:00",
"to": "2019-11-21T01:00:00-08:00",
"value": [{
"value": 0,
"label": "fw"
}, {
"value": 0,
"label": "bw"
}
]
}
]
}
]
},
"sensor-info": {
"serial-number": "D8:80:39:D9:6B:9B",
"ip-address": "192.168.0.3",
"name": "XD01",
"group": "Boost Mobile",
"device-type": "PC2"
}
}
And here is my code so far:
import json
from pandas.io.json import json_normalize
import glob
import urllib
import sqlalchemy as sa
# Create empty dataframe
# Drill through each file with json extension in the folder, open it, load it and parse it into dataframe
file = 'C:/Test/Loading/testfile.json'
with open(file) as json_file:
json_data = json.load(json_file)
df = json_normalize(json_data, meta=['sensor-time'])
df
and here is the output when I run my code:
I tried using flatten_json librarry and the best I can get it is with this code:
with open(file) as json_file:
json_data = json.load(json_file)
flat = flatten_json(json_data)
df = json_normalize(flat)
And i get output with one row with 33 columns. So in my case since i have multiple values under measurments part of json files, i am getting a column for each of the measurements. What i have to get is 3 rows with 24 columns. One row for each measurements.
So how do i modify this now?
The simplest way I think would be to use pandas.DataFrame(json_data); then you can access these information doing:
pandas.DataFrame(json_data)['sensor-time']['time']
pandas.DataFrame(json_data)['content']['element]
pandas.DataFrame(json_data)['content']['element]

JSON Schema Generator Python

I am using this resource to generate the schema https://github.com/wolverdude/GenSON/
I have the below JSON File
{
'name':'Sam',
},
{
'name':'Jack',
}
so on ...
I am wondering how to iterate over a large JSON file. I want to parse each JSON file and pass it to GENSON to generate schema
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"name": {
"type": [
"string"
]
}
},
"required": [
"name"
]
}
I think you should:
import json
from genson import SchemaBuilder
builder = SchemaBuilder()
with open(filename, 'r') as f:
datastore = json.load(f)
builder.add_object(datastore )
builder.to_schema()
Where filename is your file path.

Parsing JSON data if a key value is matched and print a key value in Python

I am very much new to JSON parsing. Below is my JSON:
[
{
"description": "Newton",
"exam_code": {
"date_added": "2015-05-13T04:49:54+00:00",
"description": "Production",
"exam_tags": [
{
"date_added": "2012-01-13T03:39:17+00:00",
"descriptive_name": "Production v0.1",
"id": 1,
"max_count": "147",
"name": "Production"
}
],
"id": 1,
"name": "Production",
"prefix": "SA"
},
"name": "CM"
},
{
"description": "Opera",
"exam_code": {
"date_added": "2015-05-13T04:49:54+00:00",
"description": "Production",
"test_tags": [
{
"date_added": "2012-02-22T12:44:55+00:00",
"descriptive_name": "Production v0.1",
"id": 1,
"max_count": "147",
"name": "Production"
}
],
"id": 1,
"name": "Production",
"prefix": "SA"
},
"name": "OS"
}
]
Here I am trying to find if name value is CM print description value.
If name value is OS then print description value.
Please help me to to understand how JSON parsing can be done?
Considering you have already read the JSON string from somewhere, be it a file, stdin, or any other source.
You can actually deserialize it into a Python object by doing:
import json
# ...
json_data = json.loads(json_str)
Where json_str is the JSON string that you want to parse.
In your case, json_str will get deserialized into a Python list, so you can do any operation on it as you'd normally do with a list.
Of course, this includes iterating over the elements:
for item in json_data:
if item.get('name') in ('CM', 'OS'):
print(item['description'])
As you can see, the items in json_data have been deserialized into dict, so you can access the actual fields using dict operations.
Note
You can also deserialize a JSON from the source directly, provided you have access to the file handler/descriptor or stream:
# Loading from a file
import json
with open('my_json.json', 'r') as fd:
# Note that we're using json.load, not json.loads
json_data = json.load(fd)
# Loading from stdin
import json, sys
json_data = json.load(sys.stdin)

Python: How to search and replace parts of a json file?

I'm new to Python and I would like to search and replace titles of IDs in a JSON-file. Normally I would use R for this Task, but how to do it in Python. Here a sample of my JSON code (with a Service ID and a layer ID). I'm interested in replacing the titles in the layer IDs:
...{"services": [
{
"id": "service",
"url": "http://...",
"title": "GEW",
"layers": [
{
"id": "0",
"title": "wrongTitle",
},
{
"id": "1",
"title": "againTitleWrong",
},
],
"options": {}
},],}
For the replace I would use a table/csv like this:
serviceID layerID oldTitle newTitle
service 0 wrongTitle newTitle1
service 1 againTitleWrong newTitle2
....
Do you have ideas? Thanks
Here's an working example on repl.it.
Code:
import json
import io
import csv
### json input
input = """
{
"layers": [
{
"id": "0",
"title": "wrongTitle"
},
{
"id": "1",
"title": "againTitleWrong"
}
]
}
"""
### parse the json
parsed_json = json.loads(input)
#### csv input
csv_input = """serviceID,layerID,oldTitle,newTitle
service,0,wrongTitle,newTitle1
service,1,againTitleWrong,newTitle2
"""
### parse csv and generate a correction lookup
parsed_csv = csv.DictReader(io.StringIO(csv_input))
lookup = {}
for row in parsed_csv:
lookup[row["layerID"]] = row["newTitle"]
#correct and print json
layers = parsed_json["layers"]
for layer in layers:
layer["title"] = lookup[layer["id"]]
parsed_json["layers"] = layers
print(json.dumps(parsed_json))
You don't say which version of Python you are using but there are built-on JSON parsers for the language.
For 2.x: https://docs.python.org/2.7/library/json.html
For 3.x: https://docs.python.org/3.4/library/json.html
These should be able to help you to parse the JSON and replace what you want.
As other users suggested, check the JSON module will be helpful.
Here gives a basic example on python2.7:
import json
j = '''{
"services":
[{
"id": "service",
"url": "http://...",
"title": "GEW",
"options": {},
"layers": [
{
"id": "0",
"title": "wrongTitle"
},
{
"id": "1",
"title": "againTitleWrong"
}
]
}]
}'''
s = json.loads(j)
s["services"][0]["layers"][0]["title"] = "new title"
# save json object to file
with open('file.json', 'w') as f:
json.dump(s, f)
You can index the element and change its title according to your csv file, which requires the use of CSV module.

Categories