CSV file to JSON file in Python - python

I have read quite a lot of posts here and elsewhere, but I can't seem to find the solution. And I do not want to convert it online.
I would like to convert a CSV file to a JSON file (no nesting, even though I might need it in the future) with this code I found here:
import csv
import json
f = open( 'sample.csv', 'r' )
reader = csv.DictReader( f, fieldnames = ( "id","name","lat","lng" ) )
out = json.dumps( [ row for row in reader ] )
print out
Awesome, simple, and it works. But I do not get a .csv file, but a text output that if I copy and paste, is one long line.
I would need a .json that is readable and ideally saved to a .json file.
Is this possible?

To get more readable JSON, try the indent argument in dumps():
print json.dumps(..., indent=4)
However - to look more like the original CSV file, what you probably want is to encode each line separately, and then join them all up using the JSON array syntax:
out = "[\n\t" + ",\n\t".join([json.dumps(row) for row in reader]) + "\n]"
That should give you something like:
[
{"id": 1, "name": "foo", ...},
{"id": 2, "name": "bar", ...},
...
]
If you need help writing the result to a file, try this tutorial.

If you want a more readable format of the JSON file, use it like this:
json.dump(output_value, open('filename','w'), indent=4, sort_keys=False)

Here's a full script. This script uses the comma-separated values of the first line as the keys for the JSON output. The output JSON file will be automatically created or overwritten using the same file name as the input CSV file name just with the .csv file extension replaced with .json.
Example CSV file:
id,longitude,latitude
1,32.774,-124.401
2,32.748,-124.424
4,32.800,-124.427
5,32.771,-124.433
Python script:
csvfile = open('sample.csv', 'r')
jsonfile = open('sample.csv'.replace('.csv', '.json'), 'w')
jsonfile.write('{"' + 'sample.csv'.replace('.csv', '') + '": [\n') # Write JSON parent of data list
fieldnames = csvfile.readline().replace('\n','').split(',') # Get fieldnames from first line of csv
num_lines = sum(1 for line in open('sample.csv')) - 1 # Count total lines in csv minus header row
reader = csv.DictReader(csvfile, fieldnames)
i = 0
for row in reader:
i += 1
json.dump(row, jsonfile)
if i < num_lines:
jsonfile.write(',')
jsonfile.write('\n')
jsonfile.write(']}')

Related

How to convert csv file into json in python so that the header of csv are keys of every json value

I have this use case
please create a function called “myfunccsvtojson” that takes in a filename path to a csv file (please refer to attached csv file) and generates a file that contains streamable line delimited JSON.
• Expected filename will be based on the csv filename, i.e. Myfilename.csv will produce Myfilename.json or File2.csv will produce File2.json. Please show this in your code and should not be hardcoded.
• csv file has 10000 lines including the header
• output JSON file should contain 9999 lines
• Sample JSON lines from the csv file below:
CSV:
nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
nm0000001,Fred Astaire,1899,1987,"soundtrack,actor,miscellaneous","tt0072308,tt0043044,tt0050419,tt0053137" nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack","tt0071877,tt0038355,tt0117057,tt0037382" nm0000003,Brigitte Bardot,1934,\N,"actress,soundtrack,producer","tt0057345,tt0059956,tt0049189,tt0054452"
JSON lines:
{"nconst":"nm0000001","primaryName":"Fred Astaire","birthYear":1899,"deathYear":1987,"primaryProfession":"soundtrack,actor,miscellaneous","knownForTitles":"tt0072308,tt0043044,tt0050419,tt0053137"}
{"nconst":"nm0000002","primaryName":"Lauren Bacall","birthYear":1924,"deathYear":2014,"primaryProfession":"actress,soundtrack","knownForTitles":"tt0071877,tt0038355,tt0117057,tt0037382"}
{"nconst":"nm0000003","primaryName":"Brigitte Bardot","birthYear":1934,"deathYear":null,"primaryProfession":"actress,soundtrack,producer","knownForTitles":"tt0057345,tt0059956,tt0049189,tt0054452"}
I am not able to understand is how the header can be inputted as a key to every value of jason.
Has anyone come access this scenario and help me out of it?
What i was trying i know loop is not correct but figuring it out
with open(file_name, encoding = 'utf-8') as file:
csv_data = csv.DictReader(file)
csvreader = csv.reader(file)
# print(csv_data)
keys = next(csvreader)
print (keys)
for i,Value in range(len(keys)), csv_data:
data[keys[i]] = Value
print (data)
You can convert your csv to pandas data frame and output as json:
df = pd.read_csv('data.csv')
df.to_json(orient='records')
import csv
import json
def csv_to_json(csv_file_path, json_file_path):
data_dict = []
with open(csv_file_path, encoding = 'utf-8') as csv_file_handler:
csv_reader = csv.DictReader(csv_file_handler)
for rows in csv_reader:
data_dict.append(rows)
with open(json_file_path, 'w', encoding = 'utf-8') as json_file_handler:
json_file_handler.write(json.dumps(data_dict, indent = 4))
csv_to_json("/home/devendra/Videos/stackoverflow/Names.csv", "/home/devendra/Videos/stackoverflow/Names.json")

Reading in CSV Files with newline characters imbedded

I am currently reading in data from a csv file and inputting tokens and their definitions into a dictionary. The code works fine until it hits a place where the data in the CSV file looks like this:
"Token000\nip address\ntesttestest"
Here is my code so far:
for line in f:
if "Token" in line and re.search("Token\d", line):
commaIndex = line.index(",", line.index("Token"))
csvDict[line[line.index("Token"): commaIndex]] = line[commaIndex + 1: line.index(",", commaIndex + 1)]
Use this:
import csv
data={}
with open('your_file.csv') as csv_file:
reader=csv.reader(csv_file, skipinitialspace=True, quotechar="'")
for row in reader:
data[row[0]]=row[1:]
print(data)
I recommend that you take a look at csv module documentation

Trying to convert a big tsv file to json

I've a tsv file, which I need to convert it into a json file. I'm using this python script which is exporting a empty json file.
import json
data={}
with open('data.json', 'w') as outfile,open("data.tsv","r") as f:
for line in f:
sp=line.split()
data.setdefault("data",[])
json.dump(data, outfile)
This can be done by pandas , but am not sure about performance
df.to_json
df = pd.read_csv('data.tsv',sep='\t') # read your tsv file
df.to_json('data.json') #save it as json . refer orient='values' or 'columns' as per your requirements
You never use the sp in your code.
To properly convert the tsv, you should read the first line separately, to get the "column names", then read the following lines and populate a list of dictionaries.
Here's what your code should look like:
import json
data=[{}]
with open('data.json', 'w') as outfile, open("data.tsv","r") as f:
firstline = f.readline()
columns = firstline.split()
lines = f.readlines()[1:]
for line in lines:
values = line.split()
entry = dict(zip(columns, values))
data.append(entry)
json.dump(data, outfile)
This will output a file containing a list of tsv rows as objects.

formatting .csv to json in python for individual key:value pairs

I'm trying format individual key:value pairs from a csv document to JSON and using json.dump(). While it seems to be working well for the most part, it is turning my integers into strings(or perhaps I need to turn my strings into integers, depending on which way it's looked at), which i do not want, and I also need one key:value pair to become a JSON array.
my code is basically this at the moment:
import csv
import json
csvfile = open('spreadsheet.csv', 'r')
jsonfile = open('fileTo.json', 'w')
fieldnames = ("Id","name","TypeId","Type", "listHere")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile, sort_keys=True, indent=4, separators=(',', ':'))
jsonfile.write(',')
jsonfile.write('\n')
where I need Id and TypeId to be integers, and listHere to become a JSON array.
currently the output is as such:
[
{
"name":"someName",
"Id":"1",
"Type":"someType",
"TypeId":"2",
"listHere":"someList"
},
]
Where what I need is:
[
{
"name":"someName",
"Id":1,
"Type":"someType",
"TypeId":2,
"listHere":
[
"someList"
]
},
]
I read through the docs, but do not really see how to do it with a spreadsheet that has thousands of entries in it. Any help will be greatly appreciated. Thanks
csv doesn't support column types, although that would be nice.
The following code (untested) has a "fixer" function for some fields. Before each row is translated into JSON, some fields' values are translated using a fixer function. int(field) in this case.
Note: although each row is output as JSON, the entire list is not. Currently it has a trailing ",". Considering using json.iterencode() to "stream" the data to a JSON file.
import csv
import json
csvfile = open('spreadsheet.csv', 'r')
jsonfile = open('fileTo.json', 'w')
fieldnames = ("Id","name","TypeId","Type", "listHere")
fieldfixers = {
'Id': int,
'Type': int,
}
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
for key,value in row.iteritems():
ffunc = fieldfixers.get(key)
if ffunc:
row[key] = ffunc(value)
json.dump(row, jsonfile, sort_keys=True, indent=4, separators=(',', ':'))
jsonfile.write(',')
jsonfile.write('\n')

Combining multiple CSV file into a single one

I have CSV files in which Data is formatted as follows:
file1.csv
ID,NAME
001,Jhon
002,Doe
fille2.csv
ID,SCHOOLS_ATTENDED
001,my Nice School
002,His lovely school
file3.csv
ID,SALARY
001,25
002,40
ID field is kind of primary key that will be used to fetch record.
What is the most efficient way to read 3 to 4 files and get corresponding data and store in another CSV file having headings (ID,NAME,SCHOOLS_ATTENDED,SALARY)?
The file sizes are in the hundreds of MBs (100, 200 Mb).
Hundreds of megabytes aren't that much. Why not go for a simple approach using the csv module and collections.defaultdict:
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = {"ID"}
for csvfile in ("file1.csv", "file2.csv", "file3.csv"):
with open(csvfile, newline="") as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key) # wasteful, but I don't care enough
result[id][key] = row[key]
The resulting defaultdict looks like this:
>>> result
defaultdict(<type 'dict'>,
{'001': {'SALARY': '25', 'SCHOOLS_ATTENDED': 'my Nice School', 'NAME': 'Jhon'},
'002': {'SALARY': '40', 'SCHOOLS_ATTENDED': 'His lovely school', 'NAME': 'Doe'}})
You could then combine that into a CSV file (not my prettiest work, but good enough for now):
with open("out.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item])
out.csv then contains
ID,NAME,SALARY,SCHOOLS_ATTENDED
001,Jhon,25,my Nice School
002,Doe,40,His lovely school
Following is the working code for combining multiple csv files with specific keywords in their names into 1 final csv file. I have set the default keyword to "file" but u can set it blank if u want to combine all csv files from a folder_path. This code will take header from your first csv file and use it as a header in final combined csv file. It will ignore headers of all other csv files.
import glob,os
#staticmethod
def Combine_multiple_csv_files_thatContainsKeywordInTheirNames_into_one_csv_file(folder_path,keyword='file'):
#takes header only from 1st csv, all other csv headers are skipped and data is appened to final csv
fileNames = glob.glob(folder_path + "*" + keyword + "*"+".csv") # fileNames INCLUDES FOLDER_PATH TOO
with open(folder_path+"Combined_csv.csv", "w", newline='') as fout:
print('Combining multiple csv files into 1')
csv_write_file = csv.writer(fout, delimiter=',')
# a.writerows(op)
with open(fileNames[0], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
csv_write_file.writerows(csv_read_file)
for num in range(1, len(fileNames)):
with open(fileNames[num], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
next(csv_read_file) # ignore header
csv_write_file.writerows(csv_read_file)

Categories