Convert CSV to restructured JSON in Python - python

I have a CSV File with following contents:
source: data.opennepal.net
District,Zone,Geographical Region,Development Region,Causalities,In Number
Sindhupalchok,Bagmati,Mountain,Central,Total No. of Houses,66688
Sindhupalchok,Bagmati,Mountain,Central,Total Population,287798
Sindhupalchok,Bagmati,Mountain,Central,Dead Male,1497
Sindhupalchok,Bagmati,Mountain,Central,Dead Female,1943
Kathmandu,Bagmati,Hill,Central,Total No. of Houses,436344
Kathmandu,Bagmati,Hill,Central,Total Population,1744240
Kathmandu,Bagmati,Hill,Central,Dead Male,621
Kathmandu,Bagmati,Hill,Central,Dead Female,600
My objective is to generate a JSON object like this from it:
{
"district":{
"Sindhupalchok":{
"Causalities":{
"Total No. of Houses":66688,
"Total Population":287798,
"Dead Male":1497,
"Dead Female":1943
},
"geoInfo":{
"Zone":"Bagmati",
"geography":"Mountain",
"Dev Region":"Central"
}
},
"Kathmandu":{
"Causalities":{
"Total No. of Houses":436344,
"Total Population":1744240,
"Dead Male":621,
"Dead Female":600
},
"geoInfo":{
"Zone":"Bagmati",
"geography":"Hill",
"Dev Region":"Central"
}
}
}
}
I've tried using csv.DictReader(csvfile, fieldnames) but it generates redundant nodes in JSON which is difficult to parse and unnecessarily lenghty.
I am using python 2.x
This is my attempt so far:
>>> csvData = open('data.csv','rb')
>>> fieldnames = ("district", "zone", "geographicalRegion", "developmentRegion", "causalities", "injuredNumber")
>>> reader = csv.DictReader(csvData, fieldnames)
>>> rawJson = json.dumps([ row for row in reader ])
rawJson isn't the one I've been seeking. It just maps the fieldnames with individual datasets.
So the question is: How can I create this JSON object without redundant nodes?

As glibdud mentions in the comments you need to loop over the data a bit more manually, so that you can create the desired JSON structure.
We read each line of the CSV data as a dict, and check if we've encountered a new district, and if so we create a new data dict for it, and insert a geoInfo dict into data. Then we can gather the casualty data from that line and the subsequent lines for that district. And once we've gathered all that data we can insert the data dict into the main all_data dict.
To test the code I put your .csv data into a file called 'qdata.csv'
import csv
import json
filename = 'qdata.csv'
fieldnames = ('district', 'Zone', 'geography',
'Dev Region', 'casualties', 'injured')
geo_keys = ('Zone', 'geography', 'Dev Region')
all_data = {}
with open(filename, 'rb') as csvfile:
reader = csv.DictReader(csvfile, fieldnames)
# skip header
next(reader)
current_district = None
for row in reader:
district = row['district']
if district != current_district:
current_district = district
data = all_data[district] = {}
casualties = data['Casualties'] = {}
data['geoInfo'] = dict((k, row[k]) for k in geo_keys)
casualties[row['casualties']] = row['injured']
print json.dumps(all_data, indent=4, sort_keys=True)
Output
{
"Kathmandu": {
"Casualties": {
"Dead Female": "600",
"Dead Male": "621",
"Total No. of Houses": "436344",
"Total Population": "1744240"
},
"geoInfo": {
"Dev Region": "Central",
"Zone": "Bagmati",
"geography": "Hill"
}
},
"Sindhupalchok": {
"Casualties": {
"Dead Female": "1943",
"Dead Male": "1497",
"Total No. of Houses": "66688",
"Total Population": "287798"
},
"geoInfo": {
"Dev Region": "Central",
"Zone": "Bagmati",
"geography": "Mountain"
}
}
}
This output isn't exactly what you've got in your question, but I think you should be able to take it from here. :)

Related

Convert CSV file to JSON with python

I am trying to covert my CSV email list to a JSON format to mass email via API. This is my code thus far but am having trouble with the output. Nothing is outputting on my VS code editor.
import csv
import json
def make_json(csvFilePath, jsonFilePath):
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['No']
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath = r'/data/csv-leads.csv'
jsonFilePath = r'Names.json'
make_json(csvFilePath, jsonFilePath)
Here is my desired JSON format
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"Name": "Youngstown Coffee",
"ConsentToTrack": "Yes"
},
Heres my CSV list
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen & Bakery,catering#zylberschtein.com,Yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,Yes
It looks like you could use a csv.DictReader to make this easier.
If I have data.csv that looks like this:
Name,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
I can convert it into JSON like this:
>>> import csv
>>> import json
>>> fd = open('data.csv')
>>> reader = csv.DictReader(fd)
>>> print(json.dumps(list(reader), indent=2))
[
{
"Name": "Zylberschtein's Delicatessen",
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes"
},
{
"Name": "Youngstown Coffee",
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes"
}
]
Here I've assumed the headers in the CSV can be used verbatim. I'll update this with an exmaple if you need to modify key names (e.g. convert "No" to "Name"),.
If you need to rename a column, it might look more like this:
import csv
import json
with open('data.csv') as fd:
reader = csv.DictReader(fd)
data = []
for row in reader:
row['Name'] = row.pop('No')
data.append(row)
print(json.dumps(data, indent=2))
Given this input:
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
This will output:
[
{
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes",
"Name": "Zylberschtein's Delicatessen"
},
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes",
"Name": "Youngstown Coffee"
}
]
and to print on my editor is it simply print(json.dumps(list(reader), indent=2))?
I'm not really familiar with your editor; print is how you generate console output in Python.

Write all values in one line csv.DictWriter

I'm having trouble to generate a well formatted CSV file out of some data i fetched from the leadfeeder API. In the csv file that is currently being created, not all values are in one row, id and leads are one column higher then the rest. Like here:
CSV Output
I later also like to load another json file and use it to map some values over the id and then put also the visits per lead into my csv file.
Do you also have some advice for this?
This is my code so far:
import json
import csv
csv_columns = ['name', 'industry', 'website_url', 'status', 'crm_lead_id', 'crm_organization_id', 'employee_count', 'id', 'type' ]
with open('data.json', 'r') as d:
d = json.load(d)
csv_file = 'lead_daten.csv'
try:
with open('leads.csv', 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns, extrasaction='ignore')
writer.writeheader()
for item in d['data']:
writer.writerow(item)
writer.writerow(item['attributes'])
except IOError:
print("I/O error")
My json data has the following structure:
I need also some of the nested values like the id in relationships!
{
"data": [
{
"attributes": {
"crm_lead_id": null,
"crm_organization_id": null,
"employee_count": 5000,
"facebook_url": null,
"first_visit_date": "2019-01-31",
"industry": "Furniture",
"last_visit_date": "2019-01-31",
"linkedin_url": null,
"name": "Example Inc",
"phone": null,
"status": "new",
"twitter_handle": "example",
"website_url": "http://www.example.com"
},
"id": "s7ybF6VxqhQqVM1m1BCnZT_8SRo9XnuoxSUP5ChvERZS9",
"relationships": {
"location": {
"data": {
"id": "8SRo9XnuoxSUP5ChvERZS9",
"type": "locations"
}
}
},
"type": "leads"
},
{
"attributes": {
"crm_lead_id": null,
When you write to a csv, you must write one full row at a time. You current code writes one row with only id and type, and then a different row with the other fields.
The correct way is to first fully build a dictionary containing all the fields and only then write it in one single operation. Code could be:
...
writer.writeheader()
for item in d['data']:
item.update(item["attributes"])
writer.writerow(item)
...

Convert a complex layered JSON to CSV

I am trying to parse through JSON code and write the results into a csv file. The "name" values are supposed to be the column headers and the 'value' values are what need to be stored.This is my code. the CSV file writer does not separate the strings with commas: eventIdlistingsvenueperformer and when I try to do something like: header = col['name']+',' I get: eventId","listings","venue","performer And it isn't read as a csv file so...My questions are: am I going about this right? and how could I separate the strings by commas?
"results": [
{
"columns": [
{
"name": "eventId",
"value": "XXXX",
"defaultHidden": false
},
{
"name": "listings",
"value": "8",
"defaultHidden": false
},
{
"name": "venue",
"value": "Nationwide Arena",
"defaultHidden": false
}]
this is my code:
json_decode=json.loads(data)
report_result = json_decode['results']
with open('testReport2.csv','w') as result_data:
csvwriter = csv.writer(result_data,delimiter=',')
count = 0
for res in report_result:
deeper = res['columns']
for col in deeper:
if count == 0:
header = col['name']
csvwriter.writerow([header,])
count += 1
for written in report_result:
deeper =res['columns']
for col in deeper:
csvwriter.writerow([trouble,])
result_data.close()
try below code:
json_decode=json.loads(data)
report_result = json_decode['results']
new_dict = {}
for result in report_result:
columns = result["columns"]
for value in columns:
new_dict[value['name']] = value['value']
with open('testReport2.csv','w') as result_data:
csvwriter = csv.DictWriter(result_data,delimiter=',',fieldnames=new_dict.keys())
csvwriter.writeheader()
csvwriter.writerow(new_dict)
Try this:
json_decode=json.loads(data)
report_result = json_decode['results']
with open('testReport2.csv','w') as result_data:
csvwriter = csv.writer(result_data,delimiter=',')
header = list(report_result[0]['columns'][0].keys())
csvwriter.writerow(header)
for written in report_result:
for row in written['columns']:
deeper =row.values()
csvwriter.writerow(deeper)
result_data.close()

Python CSV to JSON W/ Array Output

I'm trying to take data from a CSV and put it in a top-level array in JSON format.
Currently I am running this code:
import csv
import json
csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Artist")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
The CSV file is formatted as so:
| 1 | Empire of the Sun | We Are The People | Walking on a Dream |
| 2 | M83 | Steve McQueen | Hurry Up We're Dreaming |
Where = Column 1: ID | Column 2: Artist | Column 3: Song | Column 4: Album
And getting this output:
{"Song": "Empire of the Sun", "ID": "1", "Artist": "Walking on a Dream"}
{"Song": "M83", "ID": "2", "Artist": "Hurry Up We're Dreaming"}
I'm trying to get it to look like this though:
{
"Music": [
{
"id": 1,
"Artist": "Empire of the Sun",
"Name": "We are the People",
"Album": "Walking on a Dream"
},
{
"id": 2,
"Artist": "M83",
"Name": "Steve McQueen",
"Album": "Hurry Up We're Dreaming"
},
]
}
Pandas solves this really simply. First to read the file
import pandas
df = pandas.read_csv('music.csv', names=("id","Artist","Song", "Album"))
Now you have some options. The quickest way to get a proper json file out of this is simply
df.to_json('file.json', orient='records')
Output:
[{"id":1,"Artist":"Empire of the Sun","Song":"We Are The People","Album":"Walking on a Dream"},{"id":2,"Artist":"M83","Song":"Steve McQueen","Album":"Hurry Up We're Dreaming"}]
This doesn't handle the requirement that you want it all in a "Music" object or the order of the fields, but it does have the benefit of brevity.
To wrap the output in a Music object, we can use to_dict:
import json
with open('file.json', 'w') as f:
json.dump({'Music': df.to_dict(orient='records')}, f, indent=4)
Output:
{
"Music": [
{
"id": 1,
"Album": "Walking on a Dream",
"Artist": "Empire of the Sun",
"Song": "We Are The People"
},
{
"id": 2,
"Album": "Hurry Up We're Dreaming",
"Artist": "M83",
"Song": "Steve McQueen"
}
]
}
I would advise you to reconsider insisting on a particular order for the fields since the JSON specification clearly states "An object is an unordered set of name/value pairs" (emphasis mine).
Alright this is untested, but try the following:
import csv
import json
from collections import OrderedDict
fieldnames = ("ID","Artist","Song", "Artist")
entries = []
#the with statement is better since it handles closing your file properly after usage.
with open('music.csv', 'r') as csvfile:
#python's standard dict is not guaranteeing any order,
#but if you write into an OrderedDict, order of write operations will be kept in output.
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
entry = OrderedDict()
for field in fieldnames:
entry[field] = row[field]
entries.append(entry)
output = {
"Music": entries
}
with open('file.json', 'w') as jsonfile:
json.dump(output, jsonfile)
jsonfile.write('\n')
Your logic is in the wrong order. json is designed to convert a single object into JSON, recursively. So you should always be thinking in terms of building up a single object before calling dump or dumps.
First collect it into an array:
music = [r for r in reader]
Then put it in a dict:
result = {'Music': music}
Then dump to JSON:
json.dump(result, jsonfile)
Or all in one line:
json.dump({'Music': [r for r in reader]}, jsonfile)
"Ordered" JSON
If you really care about the order of object properties in the JSON (even though you shouldn't), you shouldn't use the DictReader. Instead, use the regular reader and create OrderedDicts yourself:
from collections import OrderedDict
...
reader = csv.Reader(csvfile)
music = [OrderedDict(zip(fieldnames, r)) for r in reader]
Or in a single line again:
json.dump({'Music': [OrderedDict(zip(fieldnames, r)) for r in reader]}, jsonfile)
Other
Also, use context managers for your files to ensure they're closed properly:
with open('music.csv', 'r') as csvfile, open('file.json', 'w') as jsonfile:
# Rest of your code inside this block
It didn't write to the JSON file in the order I would have liked
The csv.DictReader classes return Python dict objects. Python dictionaries are unordered collections. You have no control over their presentation order.
Python does provide an OrderedDict, which you can use if you avoid using csv.DictReader().
and it skipped the song name altogether.
This is because the file is not really a CSV file. In particular, each line begins and ends with the field separator. We can use .strip("|") to fix this.
I need all this data to be output into an array named "Music"
Then the program needs to create a dict with "Music" as a key.
I need it to have commas after each artist info. In the output I get I get
This problem is because you call json.dumps() multiple times. You should only call it once if you want a valid JSON file.
Try this:
import csv
import json
from collections import OrderedDict
def MyDictReader(fp, fieldnames):
fp = (x.strip().strip('|').strip() for x in fp)
reader = csv.reader(fp, delimiter="|")
reader = ([field.strip() for field in row] for row in reader)
dict_reader = (OrderedDict(zip(fieldnames, row)) for row in reader)
return dict_reader
csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Album")
reader = MyDictReader(csvfile, fieldnames)
json.dump({"Music": list(reader)}, jsonfile, indent=2)

Converting a dictionary to a string

I am having trouble with converting a dictionary to a string in python. I am trying to extract the information from one of my variables but cannot seem to remove the square brackets surrounding the information
for line in str(object):
if line.startswith ('['):
new_object = object.replace('[', '')
Is there a way to remove the square brackets or do I have to find another way of taking the information out of the dictionary?
Edit:
in more detail what i am trying to do here is the following
import requests
city = 'dublin'
country = 'ireland'
info = requests.get('http://api.openweathermap.org/data/2.5/weather?q='+city +','+ country +'&mode=json')
weather = info.json()['weather']
fh = open('/home/Ricky92d3/city.txt', 'w')
fh.write(str(weather))
fh.close()
fl = open('/home/Ricky92d3/city.txt')
Object = fl.read()
fl.close()
for line in str(Object):
if line.startswith ('['):
new_Object = Object.replace('[', '')
if line.startswith ('{'):
new_Object = Object.replace('{u', '')
print new_Object
i hope this makes what i am trying to do a little more clear
The object returned by info.json() is a Python dictionary, so you can access its contents using normal Python syntax. I admit that it can get a little bit tricky, since JSON dictionaries often contain other dictionaries and lists, but it's generally not too hard to figure out what's what if you print the JSON object out in a nicely formatted way. The easiest way to do that is by using the dumps() function in the standard Python json module.
The code below retrieves the JSON data into a dict called data.
It then prints the 'description' string from the list in the 'weather' item of data.
It then saves all the data (not just the 'weather' item) as an ASCII-encoded JSON file.
It then reads the JSON data back in again to a new dict called newdata, and pretty-prints it.
Finally, it prints the weather description again, to verify that we got back what we saw earlier. :)
import requests, json
#The base URL of the weather service
endpoint = 'http://api.openweathermap.org/data/2.5/weather'
#Filename for saving JSON data to
fname = 'data.json'
city = 'dublin'
country = 'ireland'
params = {
'q': '%s,%s' % (city, country),
'mode': 'json',
}
#Fetch the info
info = requests.get(endpoint, params=params)
data = info.json()
#print json.dumps(data, indent=4)
#Extract the value of 'description' from the list in 'weather'
print '\ndescription: %s\n' % data['weather'][0]['description']
#Save data
with open(fname, 'w') as f:
json.dump(data, f, indent=4)
#Reload data
with open(fname, 'r') as f:
newdata = json.load(f)
#Show all the data we just read in
print json.dumps(newdata, indent=4)
print '\ndescription: %s\n' % data['weather'][0]['description']
output
description: light intensity shower rain
{
"clouds": {
"all": 75
},
"name": "Dublin",
"visibility": 10000,
"sys": {
"country": "IE",
"sunset": 1438374108,
"message": 0.0118,
"type": 1,
"id": 5237,
"sunrise": 1438317600
},
"weather": [
{
"description": "light intensity shower rain",
"main": "Rain",
"id": 520,
"icon": "09d"
}
],
"coord": {
"lat": 53.340000000000003,
"lon": -6.2699999999999996
},
"base": "stations",
"dt": 1438347600,
"main": {
"pressure": 1014,
"humidity": 62,
"temp_max": 288.14999999999998,
"temp": 288.14999999999998,
"temp_min": 288.14999999999998
},
"id": 2964574,
"wind": {
"speed": 8.1999999999999993,
"deg": 210
},
"cod": 200
}
description: light intensity shower rain
I'm not quite sure what you're trying to do here (without seeing your dictionary) but if you have a string like x = "[myString]" you can just do the following:
x = x.replace("[", "").replace("]", "")
If this isn't working, there is a high chance you're actually getting a list returned. Though if that was the case you should see an error like this:
>>> x = [1,2,3]
>>> x.replace("[", "")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'replace'
Edit 1:
I think there's a misunderstanding of what you're getting back here. If you're just looking for a csv output file with the weather from your api try this:
import requests
import csv
city = 'dublin'
country = 'ireland'
info = requests.get('http://api.openweathermap.org/data/2.5/weather?q='+city +','+ country +'&mode=json')
weather = info.json()['weather']
weather_fieldnames = ["id", "main", "description", "icon"]
with open('city.txt', 'w') as f:
csvwriter = csv.DictWriter(f, fieldnames=weather_fieldnames)
for w in weather:
csvwriter.writerow(w)
This works by looping through the list of items you're getting and using a csv.DictWriter to write it as a row in the csv file.
Bonus
Don't call your dictionary object - It's a reserved word for the core language.

Categories