Read file and convert to JSON - python

I have file which contains the data in comma separated and i want final output to be json data. I tried below but wanted to know is there any better way to implement this?
data.txt
1,002, name, address
2,003, name_1, address_2
3,004, name_2, address_3
I want final output like below
{
"Id": "1",
"identifier": "002",
"mye": "name",
"add": "address"
}
{
"Id": "2",
"identifier": "003",
"mye": "name_2",
"add": "address_2"
}
and so on...
here is code which i am trying
list = []
with open('data.txt') as reader:
for line in reader:
list.append(line.split(','))
print(list)
above just return the list but i need to convert json key value pair defined in above

Your desired result isn't actually JSON. It's just a series of dict structures. What I think you want is a list of dictionaries. Try this:
fields = ["Id", "identifier", "mye", "add"]
my_json = []
with open('data.txt') as reader:
for line in reader:
vals = line.rstrip().split(',')
my_json.append({fields[vals.index(val)]: val for val in vals})
print(my_json)

Something like this should work:
import json
dataList = []
with open('data.txt') as reader:
# split lines in a way that strips unnecessary whitespace and newlines
for line in reader.read().replace(' ', '').split('\n'):
lineData = line.split(',')
dataList.append({
"Id": lineData[0],
"identifier": lineData[1],
"mye": lineData[2],
"add": lineData[3]
})
out_json = json.dumps(dataList)
print(out_json)
Note that you can change this line:
out_json = json.dumps(dataList)
to
out_json = json.dumps(dataList, indent=4)
and change the indent value to format the json output.
And you can write it to a file if you want to:
open("out.json", "w+").write(out_json)

Extension to one of the suggestion, but you can consider zip instead of list comprehension
import json
my_json = []
dict_header=["Id","identifier","mye","add"]
with open('data.txt') as fh:
for line in fh:
my_json.append(dict ( zip ( dict_header, line.split('\n')[0].split(',')) ))
out_file = open("test1.json", "w")
json.dump(my_json, out_file, indent = 4, sort_keys = False)
out_file.close()
This of course assuming you save the from excel to text(Tab delimnited) in excel
1 2 name address
2 3 name_1 address_2
3 4 name_2 address_3

I would suggest using pandas library, it can't get any easier.
import pandas as pd
df = pd.read_csv("data.txt", header=None)
df.columns = ["Id","identifier", "mye","add"]
df.to_json("output.json")

Related

CSV to JSON in Python takes only first and last rows

I made a function that converts data from given csv file to a json object, and the weird thing is that it only gets the first and last element of the CSV.
My csv structure is 2 columns: name,days
Example:
name,days
John,17
Fred,2
Michelle,22
When I get the json object, and print it, it gives me:
jsondata is: {
"0": {
"name": "John",
"days": "17"
},
"1": {
"name": "Michelle",
"days": "22"
}
}
Here is my code:
data = {}
with open(file, "rt") as csvf:
csvReader = csv.DictReader(csvf)
i = 0
for rows in csvReader:
data[i] = rows
i =+ 1
jsondata = json.dumps(data, indent=4)
print("jsondata is: ", jsondata)
I do not understand why would you need to use i as a counter but here a two suggestions:
data = []
with open(file, "rt") as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
data.append(rows)
jsondata = json.dumps(data, indent=4)
print("jsondata is: ", jsondata)
or
data = {}
with open(file, "rt") as csvf:
csvReader = csv.DictReader(csvf)
for i, rows in enumerate(csvReader):
data[i] = rows
jsondata = json.dumps(data, indent=4)
print("jsondata is: ", jsondata)
In both cases you will be able to access by index e.g.: data[0], data[1]...

Convert CSV file to JSON with python

I am trying to covert my CSV email list to a JSON format to mass email via API. This is my code thus far but am having trouble with the output. Nothing is outputting on my VS code editor.
import csv
import json
def make_json(csvFilePath, jsonFilePath):
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['No']
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath = r'/data/csv-leads.csv'
jsonFilePath = r'Names.json'
make_json(csvFilePath, jsonFilePath)
Here is my desired JSON format
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"Name": "Youngstown Coffee",
"ConsentToTrack": "Yes"
},
Heres my CSV list
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen & Bakery,catering#zylberschtein.com,Yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,Yes
It looks like you could use a csv.DictReader to make this easier.
If I have data.csv that looks like this:
Name,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
I can convert it into JSON like this:
>>> import csv
>>> import json
>>> fd = open('data.csv')
>>> reader = csv.DictReader(fd)
>>> print(json.dumps(list(reader), indent=2))
[
{
"Name": "Zylberschtein's Delicatessen",
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes"
},
{
"Name": "Youngstown Coffee",
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes"
}
]
Here I've assumed the headers in the CSV can be used verbatim. I'll update this with an exmaple if you need to modify key names (e.g. convert "No" to "Name"),.
If you need to rename a column, it might look more like this:
import csv
import json
with open('data.csv') as fd:
reader = csv.DictReader(fd)
data = []
for row in reader:
row['Name'] = row.pop('No')
data.append(row)
print(json.dumps(data, indent=2))
Given this input:
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
This will output:
[
{
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes",
"Name": "Zylberschtein's Delicatessen"
},
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes",
"Name": "Youngstown Coffee"
}
]
and to print on my editor is it simply print(json.dumps(list(reader), indent=2))?
I'm not really familiar with your editor; print is how you generate console output in Python.

Add character and remove the last comma in a JSON file

I am trying to create a JSON file through a CSV. Below code creates the data however not quite where I want it to be. I have some experience in python. From my understanding the JSON file should be written like this [{},{},...,{}].
How do I?:
I am able to insert the ',', however how do I remove the last ','?
How do I insert '[' at the very beginning and ']' at the very end? I tried inserting it into outputfile.write('['...etc), it shows up too many places.
Not include header on the first line of json file.
Names.csv:
id,team_name,team_members
123,Biology,"Ali Smith, Jon Doe"
234,Math,Jane Smith
345,Statistics ,"Matt P, Albert Shaw"
456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
678,Physics,"Joe Doe, Jane Smith, Ali Smith "
Code:
import csv
import json
import os
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
outfile.write("," + "\n" )
Output so far:
{"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
{"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
{"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
{"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
{"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
{"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},
First, how do you skip the header? That's easy:
next(infile) # skip the first line
for line in infile:
However, you may want to consider using a csv.DictReader for input. It handles reading the header line, and using the information there to create a dict for each row, and splitting the rows for you (as well as handling cases you may not have thought of, like quoted or escaped text that can be present in CSV files):
for row in csv.DictReader(infile):
jsondump(row,outfile)
Now onto the harder problem.
A better solution would probably be to use an iterative JSON library that can dump an iterator as a JSON array. Then you could do something like this:
def rows(infile):
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
yield row
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
genjson.dump(rows(infile), outfile)
The stdlib json.JSONEncoder has an example in the docs that does exactly this—although not very efficiently, because it first consumes the entire iterator to build a list, then dumps that:
class GenJSONEncoder(json.JSONEncoder):
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, o)
j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
outfile.write(j.encode(rows(infile)))
And really, if you're willing to build a whole list rather than encode line by line, it may be simpler to just do the listifying explicitly:
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
json.dump(list(rows(infile)))
You can go further by also overriding the iterencode method, but this will be a lot less trivial, and you'd probably want to look for an efficient, well-tested streaming iterative JSON library on PyPI instead of building it yourself from the json module.
But, meanwhile, here's a direct solution to your question, changing as little as possible from your existing code:
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# keep track of the index, just to distinguish line 0 from the rest
for i, line in enumerate(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
# add the ,\n _before_ each row except the first
if i:
outfile.write(',\n')
json.dump(row,outfile)
# write the final ]
outfile.write('\n]')
This trick—treating the first element special rather than the last—simplifies a lot of problems of this type.
Another way to simplify things is to actual iterate over adjacent pairs of lines, using a minor variation on the pairwise example in the itertools docs:
def pairwise(iterable):
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# iterate pairs of lines
for line, nextline in pairwise(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
# add the , if there is a next line
if nextline is not None:
outfile.write(',')
outfile.write('\n')
# write the final ]
outfile.write(']')
This is just as efficient as the previous version, and conceptually simpler—but a lot more abstract.
With a minimal edit to your code, you can create a list of dictionaries in Python and dump it to file as JSON all at once (assuming your dataset is small enough to fit in memory):
import csv
import json
import os
rows = [] # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
for line in infile:
row = dict()
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
rows.append(row) # Append row to list
json.dump(rows[1:], outfile) # Write entire list to file (except first row)
As an aside, you should not use id as a variable name in Python as it is a built-in function.
Pandas can handle this with ease:
df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
.map(lambda s: s.split(','))
.map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)
Seems like it would be a lot easier to use the csv.DictReader class instead of reinventing the wheel:
import csv
import json
data = []
with open('names.csv', 'r', newline='') as infile:
for row in csv.DictReader(infile):
data.append(row)
with open('names1.json','w') as outfile:
json.dump(data, outfile, indent=4)
Contents of names1.json file folloing execution (I used indent=4 just to make it more human readable):
[
{
"id": "123",
"team_name": "Biology",
"team_members": "Ali Smith, Jon Doe"
},
{
"id": "234",
"team_name": "Math",
"team_members": "Jane Smith"
},
{
"id": "345",
"team_name": "Statistics ",
"team_members": "Matt P, Albert Shaw"
},
{
"id": "456",
"team_name": "Chemistry",
"team_members": "Andrew M, Matt Shaw, Ali Smith"
},
{
"id": "678",
"team_name": "Physics",
"team_members": "Joe Doe, Jane Smith, Ali Smith"
}
]

Python Help needed : Unable to read contents of a tsv file and populate in the dictionary as desired

What i have to parse :
I have a tsv file that looks like this :
https://i.stack.imgur.com/yxsXD.png
What is the end goal:
My goal is to read the tsv file and populate the contents of the csv file in a dictionary and nested lists without using csv parser.
In the end the in_memory_table structure would look
like this ( of course with more than two rows ):
{
"header": [
"STATION",
"STATION_ID",
"ELEVATION",
"LAT",
"LONG",
"DATE",
"MNTH_MIN",
"MNTH_MAX"
],
"rows": [
[
"Tukwila",
"12345afbl",
"10",
"47.5463454",
"-122.34234234",
"2016-01-01",
"10",
"41"
],
[
"Tukwila",
"12345afbl",
"10",
"47.5463454",
"-122.34234234",
"2016-02-01",
"5",
"35"
],
]
}
My code looks like this:
in_memory_table = {
'header': [],
'rows': [] }
with open('fahrenheit_monthly_readings.tsv') as f:
in_file = f.readlines()
i = 0
for line in in_file:
temp_list = [line.split('\t')]
if (i == 0):
in_memory_table['header']= line
elif(i != 0):
in_memory_table['rows'].append(line)
i += 1
print("\n",in_memory_table)
Output of the code:
C:\Users\svats\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/svats/PycharmProjects/BrandNew/module4_lab2/module4_lab2.py
{'header': 'STATION\tSTATION_ID\tELEVATION\tLAT\tLONG\tDATE\tMNTH_MIN\tMNTH_MAX\n', 'rows': ['Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-01-01\t10\t41\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-02-01\t5\t35\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-03-01\t32\t47\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-04-01\t35\t49\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-05-01\t41\t60\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-06-01\t50\t72\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-07-01\t57\t70\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-08-01\t68\t79\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-09-01\t55\t71\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-10-01\t47\t77\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-11-01\t32\t66\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-12-01\t27\t55\n']}
Help needed:
i am very close towards getting the solution
I have 2 questions :
1. how to get rid of the \t in the o/p?
2. My o/p is little different from the desired o/p. how do i get it ?
If you rewrite your code as:
for line in in_file:
print('repr(line) before :', repr(line) )
temp_list = [line.split()]
#line = line.split()
print('temp_list :',temp_list)
print('repr(line) after :', repr(line) )
print(' %s -----------------' % i)
if ........
and de-comment the line #line = line.split()
you'll understand the reason of the bad result you obtain.
The reason is that line.split() doesn't change the object of name line ,
it creates a new object (the list you want) to which name line must be re-assigned if you want this name to refer to the obtained list.
Note that the method str.split([sep[, maxsplit]]) has a different algorithm according if parameter sep is None or not None, see documentation https://docs.python.org/2/library/stdtypes.html#str.split for this point
.
That said, there's a better way.
with open('fahrenheit_monthly_readings.tsv','r') as f:
in_memory_table = {'header':next(f).split()}
in_memory_table['rows'] = [line.split() for line in f]
or
with open('fahrenheit_monthly_readings.tsv','r') as f:
in_memory_table = {'header':next(f).split()}
in_memory_table['rows'] = list(map(str.split, f))

Python CSV to JSON W/ Array Output

I'm trying to take data from a CSV and put it in a top-level array in JSON format.
Currently I am running this code:
import csv
import json
csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Artist")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
The CSV file is formatted as so:
| 1 | Empire of the Sun | We Are The People | Walking on a Dream |
| 2 | M83 | Steve McQueen | Hurry Up We're Dreaming |
Where = Column 1: ID | Column 2: Artist | Column 3: Song | Column 4: Album
And getting this output:
{"Song": "Empire of the Sun", "ID": "1", "Artist": "Walking on a Dream"}
{"Song": "M83", "ID": "2", "Artist": "Hurry Up We're Dreaming"}
I'm trying to get it to look like this though:
{
"Music": [
{
"id": 1,
"Artist": "Empire of the Sun",
"Name": "We are the People",
"Album": "Walking on a Dream"
},
{
"id": 2,
"Artist": "M83",
"Name": "Steve McQueen",
"Album": "Hurry Up We're Dreaming"
},
]
}
Pandas solves this really simply. First to read the file
import pandas
df = pandas.read_csv('music.csv', names=("id","Artist","Song", "Album"))
Now you have some options. The quickest way to get a proper json file out of this is simply
df.to_json('file.json', orient='records')
Output:
[{"id":1,"Artist":"Empire of the Sun","Song":"We Are The People","Album":"Walking on a Dream"},{"id":2,"Artist":"M83","Song":"Steve McQueen","Album":"Hurry Up We're Dreaming"}]
This doesn't handle the requirement that you want it all in a "Music" object or the order of the fields, but it does have the benefit of brevity.
To wrap the output in a Music object, we can use to_dict:
import json
with open('file.json', 'w') as f:
json.dump({'Music': df.to_dict(orient='records')}, f, indent=4)
Output:
{
"Music": [
{
"id": 1,
"Album": "Walking on a Dream",
"Artist": "Empire of the Sun",
"Song": "We Are The People"
},
{
"id": 2,
"Album": "Hurry Up We're Dreaming",
"Artist": "M83",
"Song": "Steve McQueen"
}
]
}
I would advise you to reconsider insisting on a particular order for the fields since the JSON specification clearly states "An object is an unordered set of name/value pairs" (emphasis mine).
Alright this is untested, but try the following:
import csv
import json
from collections import OrderedDict
fieldnames = ("ID","Artist","Song", "Artist")
entries = []
#the with statement is better since it handles closing your file properly after usage.
with open('music.csv', 'r') as csvfile:
#python's standard dict is not guaranteeing any order,
#but if you write into an OrderedDict, order of write operations will be kept in output.
reader = csv.DictReader(csvfile, fieldnames)
for row in reader:
entry = OrderedDict()
for field in fieldnames:
entry[field] = row[field]
entries.append(entry)
output = {
"Music": entries
}
with open('file.json', 'w') as jsonfile:
json.dump(output, jsonfile)
jsonfile.write('\n')
Your logic is in the wrong order. json is designed to convert a single object into JSON, recursively. So you should always be thinking in terms of building up a single object before calling dump or dumps.
First collect it into an array:
music = [r for r in reader]
Then put it in a dict:
result = {'Music': music}
Then dump to JSON:
json.dump(result, jsonfile)
Or all in one line:
json.dump({'Music': [r for r in reader]}, jsonfile)
"Ordered" JSON
If you really care about the order of object properties in the JSON (even though you shouldn't), you shouldn't use the DictReader. Instead, use the regular reader and create OrderedDicts yourself:
from collections import OrderedDict
...
reader = csv.Reader(csvfile)
music = [OrderedDict(zip(fieldnames, r)) for r in reader]
Or in a single line again:
json.dump({'Music': [OrderedDict(zip(fieldnames, r)) for r in reader]}, jsonfile)
Other
Also, use context managers for your files to ensure they're closed properly:
with open('music.csv', 'r') as csvfile, open('file.json', 'w') as jsonfile:
# Rest of your code inside this block
It didn't write to the JSON file in the order I would have liked
The csv.DictReader classes return Python dict objects. Python dictionaries are unordered collections. You have no control over their presentation order.
Python does provide an OrderedDict, which you can use if you avoid using csv.DictReader().
and it skipped the song name altogether.
This is because the file is not really a CSV file. In particular, each line begins and ends with the field separator. We can use .strip("|") to fix this.
I need all this data to be output into an array named "Music"
Then the program needs to create a dict with "Music" as a key.
I need it to have commas after each artist info. In the output I get I get
This problem is because you call json.dumps() multiple times. You should only call it once if you want a valid JSON file.
Try this:
import csv
import json
from collections import OrderedDict
def MyDictReader(fp, fieldnames):
fp = (x.strip().strip('|').strip() for x in fp)
reader = csv.reader(fp, delimiter="|")
reader = ([field.strip() for field in row] for row in reader)
dict_reader = (OrderedDict(zip(fieldnames, row)) for row in reader)
return dict_reader
csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Album")
reader = MyDictReader(csvfile, fieldnames)
json.dump({"Music": list(reader)}, jsonfile, indent=2)

Categories