Python CSV to JSON W/ Array Output - python

I'm trying to take data from a CSV and put it in a top-level array in JSON format.
Currently I am running this code:
import csv
import json
csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Artist")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
The CSV file is formatted as so:
| 1 | Empire of the Sun | We Are The People | Walking on a Dream |
| 2 | M83 | Steve McQueen | Hurry Up We're Dreaming |
Where = Column 1: ID | Column 2: Artist | Column 3: Song | Column 4: Album
And getting this output:
{"Song": "Empire of the Sun", "ID": "1", "Artist": "Walking on a Dream"}
{"Song": "M83", "ID": "2", "Artist": "Hurry Up We're Dreaming"}
I'm trying to get it to look like this though:
{
"Music": [
{
"id": 1,
"Artist": "Empire of the Sun",
"Name": "We are the People",
"Album": "Walking on a Dream"
},
{
"id": 2,
"Artist": "M83",
"Name": "Steve McQueen",
"Album": "Hurry Up We're Dreaming"
},
]
}

Pandas solves this really simply. First to read the file
import pandas
df = pandas.read_csv('music.csv', names=("id","Artist","Song", "Album"))
Now you have some options. The quickest way to get a proper json file out of this is simply
df.to_json('file.json', orient='records')
Output:
[{"id":1,"Artist":"Empire of the Sun","Song":"We Are The People","Album":"Walking on a Dream"},{"id":2,"Artist":"M83","Song":"Steve McQueen","Album":"Hurry Up We're Dreaming"}]
This doesn't handle the requirement that you want it all in a "Music" object or the order of the fields, but it does have the benefit of brevity.
To wrap the output in a Music object, we can use to_dict:
import json
with open('file.json', 'w') as f:
json.dump({'Music': df.to_dict(orient='records')}, f, indent=4)
Output:
{
"Music": [
{
"id": 1,
"Album": "Walking on a Dream",
"Artist": "Empire of the Sun",
"Song": "We Are The People"
},
{
"id": 2,
"Album": "Hurry Up We're Dreaming",
"Artist": "M83",
"Song": "Steve McQueen"
}
]
}
I would advise you to reconsider insisting on a particular order for the fields since the JSON specification clearly states "An object is an unordered set of name/value pairs" (emphasis mine).

Alright this is untested, but try the following:
import csv
import json
from collections import OrderedDict
fieldnames = ("ID","Artist","Song", "Artist")
entries = []
#the with statement is better since it handles closing your file properly after usage.
with open('music.csv', 'r') as csvfile:
#python's standard dict is not guaranteeing any order,
#but if you write into an OrderedDict, order of write operations will be kept in output.
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
entry = OrderedDict()
for field in fieldnames:
entry[field] = row[field]
entries.append(entry)
output = {
"Music": entries
}
with open('file.json', 'w') as jsonfile:
json.dump(output, jsonfile)
jsonfile.write('\n')

Your logic is in the wrong order. json is designed to convert a single object into JSON, recursively. So you should always be thinking in terms of building up a single object before calling dump or dumps.
First collect it into an array:
music = [r for r in reader]
Then put it in a dict:
result = {'Music': music}
Then dump to JSON:
json.dump(result, jsonfile)
Or all in one line:
json.dump({'Music': [r for r in reader]}, jsonfile)
"Ordered" JSON
If you really care about the order of object properties in the JSON (even though you shouldn't), you shouldn't use the DictReader. Instead, use the regular reader and create OrderedDicts yourself:
from collections import OrderedDict
...
reader = csv.Reader(csvfile)
music = [OrderedDict(zip(fieldnames, r)) for r in reader]
Or in a single line again:
json.dump({'Music': [OrderedDict(zip(fieldnames, r)) for r in reader]}, jsonfile)
Other
Also, use context managers for your files to ensure they're closed properly:
with open('music.csv', 'r') as csvfile, open('file.json', 'w') as jsonfile:
# Rest of your code inside this block

It didn't write to the JSON file in the order I would have liked
The csv.DictReader classes return Python dict objects. Python dictionaries are unordered collections. You have no control over their presentation order.
Python does provide an OrderedDict, which you can use if you avoid using csv.DictReader().
and it skipped the song name altogether.
This is because the file is not really a CSV file. In particular, each line begins and ends with the field separator. We can use .strip("|") to fix this.
I need all this data to be output into an array named "Music"
Then the program needs to create a dict with "Music" as a key.
I need it to have commas after each artist info. In the output I get I get
This problem is because you call json.dumps() multiple times. You should only call it once if you want a valid JSON file.
Try this:
import csv
import json
from collections import OrderedDict
def MyDictReader(fp, fieldnames):
fp = (x.strip().strip('|').strip() for x in fp)
reader = csv.reader(fp, delimiter="|")
reader = ([field.strip() for field in row] for row in reader)
dict_reader = (OrderedDict(zip(fieldnames, row)) for row in reader)
return dict_reader
csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Album")
reader = MyDictReader(csvfile, fieldnames)
json.dump({"Music": list(reader)}, jsonfile, indent=2)

Related

Converting CSV file to .json file in a specific format using python

I have the below input.csv file and I'm having trouble in converting it to a .json file.
Below is the input.csv file that I have which I want to convert it into .json file. The Text field is in Sinhala Language
Date,Text,Category
2021-07-28,"['ලංකාව', 'ලංකාව']",Sports
2021-07-28,"['ඊයේ', 'ඊයේ']",Sports
2021-07-29,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
2021-07-29,"['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']",Sports
2021-08-01,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
The .json format that I want to have is as of below
[
{
"category":"Sports",
"date":"2021-07-28",
"data": ['ලංකාව', 'ලංකාව']
},
{
"category":"Sports",
"date":"2021-07-28",
"data": ['ඊයේ', 'ඊයේ']
},
{
"category":"Sports",
"date":"2021-07-29",
"data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
},
{
"category":"Sports",
"date":"2021-07-29",
"data": ['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']
},
{
"category":"Sports",
"date":"2021-08-01",
"data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
}
]
Below is how I tried, since this is of Sinhala Language, values are show in this format \u0d8a\u0dba\u0dda, which is another thing that I'm struggling to sort out. And the json format is also wrong that I expect it to be.
import csv
import json
def toJson():
csvfile = open('outputS.csv', 'r', encoding='utf-8')
jsonfile = open('file.json', 'w')
fieldnames = ("date", "text", "category")
reader = csv.DictReader(csvfile, fieldnames)
out = json.dumps([row for row in reader])
jsonfile.write(out)
if __name__ == '__main__':
toJson()
Use ensure_ascii=False when doing json.dumps:
out = json.dumps([row for row in reader], ensure_ascii=False)
Other notes:
Since the first row of the csv contains the column names, you should either skip this first row, or let csv.DictReader use the first row as the column names automatically by not passing explicit values to fieldnames.
It's very bad practice to use open and then not close it.
To make things easier you can use a with statement.
The second column of the csv file will be treated as a string and not as a list of strings unless you specifically parse it as such (you can use literal_eval from the ast module for this).
You can use json.dump instead of json.dumps to write directly to the file.
With this, you can rewrite your function to:
def toJson():
with (open('delete.csv', 'r', encoding='utf-8') as csvfile,
open('file.json', 'w') as jsonfile):
fieldnames = ("date", "text", "category")
reader = csv.DictReader(csvfile, fieldnames)
next(reader) # skip header row
json.dump([row for row in reader], jsonfile, ensure_ascii=False)
Read your CSV using pandas # using pd.read_csv()
use to_dict function with orient option set to records
df = pd.read_csv('your_csv_file_name.csv')
df.to_dict(orient='records')

how to can loop through an array of array in a json object in python

https://github.com/Asabeneh/30-Days-Of-Python/blob/ff24ab221faaec455b664ad5bbdc6e0de76c3caf/data/countries_data.json
how can i loop through this countries_data.json file (see link above) to get 'languages'
i have tried:
import json
f = open("countries_data.json")
file = f.read()
# print(file)
for item in file:
print(item)
You have everything correct and set up but you didn't load the json file. Also there is a double space on "f = open". You also didn't open the file with the read parameter, not too sure if its needed though.
Correct code:
import json
f = open("countries_data.json", "r")
file = json.loads(f.read())
for item in file:
print(item)
Hope this helped, always double check your code.
You can see that you import the json module at the beginning, so you might as well use it
If you go to the documentation you will see a function allowing you to read this file directly.
In the end you end up with just a dictionary list, the code can be summarized as follows.
import json
with open("test/countries_data.json") as file:
data = json.load(file)
for item in data:
print(item["languages"])
You are missing one essential step, which is parsing the JSON data to Python datastructures.
import json
# read file
f = open("countries.json")
# parse JSON to Python datastructures
countries = json.load(f)
# now you have a list of countries
print(type(countries))
# loop through list of countries
for country in countries:
# you can access languages with country["languages"]; JSON objects are Python dictionaries now
print(type(country))
for language in country["languages"]:
print(language)
f.close()
Expected output:
<class 'list'>
<class 'dict'>
Pashto
Uzbek
Turkmen
...
You can use the json built-in package to deserialize the content of that file.
A sample of usage
data = """[
{
"name": "Afghanistan",
"capital": "Kabul",
"languages": [
"Pashto",
"Uzbek",
"Turkmen"
],
"population": 27657145,
"flag": "https://restcountries.eu/data/afg.svg",
"currency": "Afghan afghani"
},
{
"name": "Åland Islands",
"capital": "Mariehamn",
"languages": [
"Swedish"
],
"population": 28875,
"flag": "https://restcountries.eu/data/ala.svg",
"currency": "Euro"
}]"""
# deserializing
print(json.loads(data))
For more complex content have a look to the JSONDecoder.
doc
EDIT:
import json
path = # my file
with open(path, 'r') as fd:
# iterate over the dictionaries
for d in json.loads(fd.read()):
print(d['languages'])
EDIT: extra - top 10 languages
import json
import itertools as it
path = # path to file
with open(path, 'r') as fd:
text = fd.read()
languages_from_file = list(it.chain(*(d['languages'] for d in json.loads(text))))
# get unique "list" of languages
languages_all = set(languages_from_file)
# count the repeated languages
languages_count = {l: languages_from_file.count(l) for l in languages_all}
# order them per descending value
top_ten_languages = sorted(languages_count.items(), key=lambda k: k[1], reverse=True)[:10]
print(top_ten_languages)

Read file and convert to JSON

I have file which contains the data in comma separated and i want final output to be json data. I tried below but wanted to know is there any better way to implement this?
data.txt
1,002, name, address
2,003, name_1, address_2
3,004, name_2, address_3
I want final output like below
{
"Id": "1",
"identifier": "002",
"mye": "name",
"add": "address"
}
{
"Id": "2",
"identifier": "003",
"mye": "name_2",
"add": "address_2"
}
and so on...
here is code which i am trying
list = []
with open('data.txt') as reader:
for line in reader:
list.append(line.split(','))
print(list)
above just return the list but i need to convert json key value pair defined in above
Your desired result isn't actually JSON. It's just a series of dict structures. What I think you want is a list of dictionaries. Try this:
fields = ["Id", "identifier", "mye", "add"]
my_json = []
with open('data.txt') as reader:
for line in reader:
vals = line.rstrip().split(',')
my_json.append({fields[vals.index(val)]: val for val in vals})
print(my_json)
Something like this should work:
import json
dataList = []
with open('data.txt') as reader:
# split lines in a way that strips unnecessary whitespace and newlines
for line in reader.read().replace(' ', '').split('\n'):
lineData = line.split(',')
dataList.append({
"Id": lineData[0],
"identifier": lineData[1],
"mye": lineData[2],
"add": lineData[3]
})
out_json = json.dumps(dataList)
print(out_json)
Note that you can change this line:
out_json = json.dumps(dataList)
to
out_json = json.dumps(dataList, indent=4)
and change the indent value to format the json output.
And you can write it to a file if you want to:
open("out.json", "w+").write(out_json)
Extension to one of the suggestion, but you can consider zip instead of list comprehension
import json
my_json = []
dict_header=["Id","identifier","mye","add"]
with open('data.txt') as fh:
for line in fh:
my_json.append(dict ( zip ( dict_header, line.split('\n')[0].split(',')) ))
out_file = open("test1.json", "w")
json.dump(my_json, out_file, indent = 4, sort_keys = False)
out_file.close()
This of course assuming you save the from excel to text(Tab delimnited) in excel
1 2 name address
2 3 name_1 address_2
3 4 name_2 address_3
I would suggest using pandas library, it can't get any easier.
import pandas as pd
df = pd.read_csv("data.txt", header=None)
df.columns = ["Id","identifier", "mye","add"]
df.to_json("output.json")

Convert CSV file to JSON with python

I am trying to covert my CSV email list to a JSON format to mass email via API. This is my code thus far but am having trouble with the output. Nothing is outputting on my VS code editor.
import csv
import json
def make_json(csvFilePath, jsonFilePath):
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['No']
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath = r'/data/csv-leads.csv'
jsonFilePath = r'Names.json'
make_json(csvFilePath, jsonFilePath)
Here is my desired JSON format
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"Name": "Youngstown Coffee",
"ConsentToTrack": "Yes"
},
Heres my CSV list
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen & Bakery,catering#zylberschtein.com,Yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,Yes
It looks like you could use a csv.DictReader to make this easier.
If I have data.csv that looks like this:
Name,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
I can convert it into JSON like this:
>>> import csv
>>> import json
>>> fd = open('data.csv')
>>> reader = csv.DictReader(fd)
>>> print(json.dumps(list(reader), indent=2))
[
{
"Name": "Zylberschtein's Delicatessen",
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes"
},
{
"Name": "Youngstown Coffee",
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes"
}
]
Here I've assumed the headers in the CSV can be used verbatim. I'll update this with an exmaple if you need to modify key names (e.g. convert "No" to "Name"),.
If you need to rename a column, it might look more like this:
import csv
import json
with open('data.csv') as fd:
reader = csv.DictReader(fd)
data = []
for row in reader:
row['Name'] = row.pop('No')
data.append(row)
print(json.dumps(data, indent=2))
Given this input:
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
This will output:
[
{
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes",
"Name": "Zylberschtein's Delicatessen"
},
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes",
"Name": "Youngstown Coffee"
}
]
and to print on my editor is it simply print(json.dumps(list(reader), indent=2))?
I'm not really familiar with your editor; print is how you generate console output in Python.

Python Parse generic Json into table format

I am just trying to convert simple Json into CSV making it like a table format, so I can easily load them into my database.
I am trying to create some generic code to parse some Json with different metadata, so I hope I don't have to specify the column name and instead hoping Python will generate the column name itself.
just like this Json
[
{
"name":"mike",
"sal":"1000",
"dept":"IT",
},
{
"name":"Joe",
"sal":"1200",
"dept":"IT",
}
]
to make it format like this:
name sal dept
Mike 1000 IT
Joe 1200 IT
I use the below code but it doesn't work
import json
import csv
infile = open(r'c:\test\test.json', 'r')
outfile = open(r'c:\test\test.csv', 'w')
writer = csv.writer(outfile)
for row in json.loads(infile.read()):
writer.writerows(row)
Can someone show me some sample code to do this?
Thanks
This will help you:
writer = csv.DictWriter(f, fieldnames=['name', 'sal', 'dept'])
writer.writeheader()
for i in json.loads(a):
writer.writerow({'name': i['name'], 'sal': i['sal'], 'dept': i['dept']})
I tried your sample code and it seems to be necessary to format your .json with spaces after the colon, like this.
[
{
"name": "mike",
"sal": "1000",
"dept": "IT",
},
{
"name": "Joe",
"sal": "1200",
"dept": "IT",
}
]
Then you can read in your code line by line into a dict.
Everything else can be found here:
How do I convert this list of dictionaries to a csv file? [Python]
Reading the JSON is fine. You need to make use of csv.DictWriter to write to the csv. It will also allow you to provide fieldnames and hence the headers in the csv file.
This will do the conversion -
import json
import csv
infile = open(r'test.json', 'r')
with open('test.csv', 'w') as csvfile:
fieldnames = ['name', 'sal', 'dept']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in json.loads(infile.read()):
print(row)
writer.writerow(row)
Do refer https://docs.python.org/2/library/csv.html for further helps.

Categories