Updating a schema bound csv from a collection of json documents - python

I have a collection of JSON documents. I need to aggregate the data from all these documents into a portable format like CSV for easy access to data in excel or other analytics tools.
The problem I face is that I am creating JSON document by adding keys one by one. Because of this all the keys in the JSON get randomized and I'm not sure that when I parse the JSON document into CSV it will retain its schema (not as in RDBMS but the 2d excel schema)
I just want to ensure that when I update the CSV file everytime with csv.writerow() each value should correspond to its header which was set first time.
Any ideas how can I achieve my goal?

One way is to use csv.DictWriter to create the CSV file:
import json
import csv
# Two JSON documents
jsondoc1 = '''{"a":"aardvark", "b":"bengal tiger"}'''
jsondoc2 = '''{"a":"Samuel Adams", "b":"Carter Braxton"}'''
# Create a CSV file, then use csv.DictWriter() to write the header
# and one for for each JSON document
with open("output.csv", "wt") as output_file:
output_file = csv.DictWriter(output_file, ["a", "b"])
output_file.writeheader()
output_file.writerow(json.loads(jsondoc1))
output_file.writerow(json.loads(jsondoc2))
Result:
a,b
aardvark,bengal tiger
Samuel Adams,Carter Braxton

Related

convert excel to json file in python

I am new here , need some help with writing to json file:
I have a dataframe with below values, which is created by reading a excel file
need to write this to json file with object as column dtls
Output :
A similar task is considered in the question:
Converting Excel into JSON using Python
Different approaches are possible to solve this problem.
I hope, it works for your solution.
import pandas as pd
import json
df = pd.read_excel('./TfidfVectorizer_sklearn.xlsx')
df.to_json('new_file1.json', orient='records') # excel to json
# read json and then append details to it
with open('./new_file1.json', 'r') as json_file:
a = {}
data = json.load(json_file)
a['details'] = data
# write new json with details in it
with open("./new_file1.json", "w") as jsonFile:
json.dump(a, jsonFile)
JSON Output:

From grouped rows in Excel to json using python

Can someone help me to convert this excel to json format using python please, i have tasks and subtasks like in this picture link
You can either use the xlrd library or the pandas library. Read the documentation of both and choose which would be best for you.
Pandas would look like this (stolen from someone else):
import pandas
import json
# Read excel document
excel_data_df = pandas.read_excel('data.xlsx', sheet_name='sheet1')
# Convert excel to string
# (define orientation of document in this case from up to down)
thisisjson = excel_data_df.to_json(orient='records')
# Print out the result
print('Excel Sheet to JSON:\n', thisisjson)
# Make the string into a list to be able to input in to a JSON-file
thisisjson_dict = json.loads(thisisjson)
# Define file to write to and 'w' for write option -> json.dump()
# defining the list to write from and file to write to
with open('data.json', 'w') as json_file:
json.dump(thisisjson_dict, json_file)
Converting Excel into JSON using Python

How do I export JSON data to CSV using python?

I'm building a site that, based on a user's input, sorts through JSON data and prints a schedule for them into an html table. I want to give it the functionality that once the their table is created they can export the data to a CSV/Excel file so we don't have to store their credentials (logins & schedules in a database). Is this possible? If so, how can I do it using python preferably?
This is not the exact answer but rather steps for you to follow in order to get a solution:
1 Read data from json. some_dict = json.loads(json_string)
2 Appropriate code to get the data from dictionary (sort/ conditions etc) and get that data in a 2D array (list)
3 Save that list as csv: https://realpython.com/python-csv/
I'm pretty lazy and like to utilize pandas for things like this. It would be something along the lines of
import pandas as pd
file = 'data.json'
with open(file) as j:
json_data = json.load(j)
df = pd.DataFrame.from_dict(j, orient='index')
df.to_csv("data.csv")

Writing value to given filed in csv file using pandas or csv module

Is there any way you can write value to specific place in given .csv file using pandas or csv module?
I have tried using csv_reader to read the file and find a line which fits my requirements though I couldn't figure out a way to switch value which is in the file to mine.
What I am trying to achieve here is that I have a spreadsheet of names and values. I am using JSON to update the values from the server and after that I want to update my spreadsheet also.
The latest solution which I came up with was to create separate sheet from which I will get updated data, but this one is not working, though there is no sequence in which the dict is written to the file.
def updateSheet(fileName, aValues):
with open(fileName+".csv") as workingSheet:
writer = csv.DictWriter(workingSheet,aValues.keys())
writer.writeheader()
writer.writerow(aValues)
I will appreciate any guidance and tips.
You can try this way to operate the specified csv file
import pandas as pd
a = ['one','two','three']
b = [1,2,3]
english_column = pd.Series(a, name='english')
number_column = pd.Series(b, name='number')
predictions = pd.concat([english_column, number_column], axis=1)
save = pd.DataFrame({'english':a,'number':b})
save.to_csv('b.csv',index=False,sep=',')

Reading csv from url and pushing it in DB through pandas

The URL gives a csv formatted data. I am trying to get the data and push it in database. However, I am unable to read data as it only prints header of the file and not complete csv data. Could there be better option?
#!/usr/bin/python3
import pandas as pd
data = pd.read_csv("some-url") //URL not provided due to security restrictions.
for row in data:
print(row)
You can iterate through the results of df.to_dict(orient="records"):
data = pd.read_csv("some-url")
for row in data.to_dict(orient="records"):
# For each loop, `row` will be filled with a key:value dict where each
# key takes the value of the column name.
# Use this dict to create a record for your db insert, eg as raw SQL or
# to create an instance for an ORM like SQLAlchemy.
I do a similar thing to pre-format data for SQLAlchemy inserts, although I'm using Pandas to merge data from multiple sources rather than just reading the file.
Side note: There will be plenty of other ways to do this without Pandas and just iterate through the lines of the file. However Pandas's intuituve handling of CSVs makes it an attractive shortcut to do what you need.

Categories