Download JSON data and convert it to CSV using Python - python

I'm currently using Yahoo Pipes which provides me with a JSON file from an URL.
I would like to be able to fetch it and convert it into a CSV file, and I have no idea where to begin (I'm a complete beginner in Python).
How can I fetch the JSON data from the URL?
How can I transform it to CSV?
Thank you

import urllib2
import json
import csv
def getRows(data):
# ?? this totally depends on what's in your data
return []
url = "http://www.yahoo.com/something"
data = urllib2.urlopen(url).read()
data = json.loads(data)
fname = "mydata.csv"
with open(fname,'wb') as outf:
outcsv = csv.writer(outf)
outcsv.writerows(getRows(data))

Related

How to run a json file in python

The goal is to open a json file or websites so that I can view earthquake data. I create a json function that use dictionary and a list but within the terminal an error appears as a invalid argument. What is the best way to open a json file using python?
import requests
`def earthquake_daily_summary():
req = requests.get("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson")
data = req.json() # The .json() function will convert the json data from the server to a dictionary
# Open json file
f = open('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson')
# returns Json oject as a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['emp_details']:
print(i)
f.close()
print("\n=========== PROBLEM 5 TESTS ===========")
earthquake_daily_summary()`
You can immediately convert the response to json and read the data you need.
I didn't find the 'emp_details' key, so I replaced it with 'features'.
import requests
def earthquake_daily_summary():
data = requests.get("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson").json()
for row in data['features']:
print(row)
print("\n=========== PROBLEM 5 TESTS ===========")
earthquake_daily_summary()

Converting xml to json for Mongo db

I am currently trying to convert an xml document with approx 2k records to json to upload to Mongo DB.
I have written a python script for the conversion but when I upload it to Mongo db the collection is reading this as one document with 2k sub arrays (objects) but I am trying to get 2k documents instead. My thoughts are it could be the python code? Can anyone help.
# Program to convert an xml
# file to json file
# import json module and xmltodict
# module provided by python
import json
import xmltodict
# open the input xml file and read
# data in form of python dictionary
# using xmltodict module
with open("test.xml") as xml_file:
data_dict = xmltodict.parse(xml_file.read())
# xml_file.close()
# generate the object using json.dumps()
# corresponding to json data
json_data = json.dumps(data_dict)
# Write the json data to output
# json file
with open("data.json", "w") as json_file:
json_file.write(json_data)
# json_file.close()
I am not sure why you would expect an XML-to-JSON converter to automatically split the XML at "record" boundaries. After all, XML doesn't have a built-in concept of "records" - that's something in the semantics of your vocabulary, not in the syntax of XML.
The easiest way to split an XML file into multiple files is with a simple XSLT 2.0+ stylesheet. If you use XSLT 3.0 then you can invoke the JSON conversion at the same time.
Here is my solution.
import xmltodict
import json
import pprint
# Open xml file
with open(r"test.xml", "rb") as xml_file:
# data_dict = xmltodict.parse(xml_file.read())
dict_data = xmltodict.parse(xml_file)
output_data = dict_data["root"]["course_listing"]
json_data = json.dumps(output_data, indent=2)
print(json_data)
with open("datanew.json", "w") as json_file:
json_file.write(json_data)

Print JSON data from csv list of multiple urls

Very new to Python and haven't found specific answer on SO but apologies in advance if this appears very naive or elsewhere already.
I am trying to print 'IncorporationDate' JSON data from multiple urls of public data set. I have the urls saved as a csv file, snippet below. I am only getting as far as printing ALL the JSON data from one url, and I am uncertain how to run that over all of the csv urls, and write to csv just the IncorporationDate values.
Any basic guidance or edits are really welcomed!
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen
import json
def get_jsonparsed_data(url):
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url = ("http://data.companieshouse.gov.uk/doc/company/01046514.json")
print(get_jsonparsed_data(url))
import csv
with open('test.csv') as f:
lis=[line.split() for line in f]
for i,x in enumerate(lis):
print ()
import StringIO
s = StringIO.StringIO()
with open('example.csv', 'w') as f:
for line in s:
f.write(line)
Snippet of csv:
http://business.data.gov.uk/id/company/01046514.json
http://business.data.gov.uk/id/company/01751318.json
http://business.data.gov.uk/id/company/03164710.json
http://business.data.gov.uk/id/company/04403406.json
http://business.data.gov.uk/id/company/04405987.json
Welcome to the Python world.
For dealing with making http requests, we commonly use requests because it's dead simple api.
The code snippet below does what I believe you want:
It grabs the data from each of the urls you posted
It creates a new CSV file with each of the IncorporationDate keys.
```
import csv
import requests
COMPANY_URLS = [
'http://business.data.gov.uk/id/company/01046514.json',
'http://business.data.gov.uk/id/company/01751318.json',
'http://business.data.gov.uk/id/company/03164710.json',
'http://business.data.gov.uk/id/company/04403406.json',
'http://business.data.gov.uk/id/company/04405987.json',
]
def get_company_data():
for url in COMPANY_URLS:
res = requests.get(url)
if res.status_code == 200:
yield res.json()
if __name__ == '__main__':
for data in get_company_data():
try:
incorporation_date = data['primaryTopic']['IncorporationDate']
except KeyError:
continue
else:
with open('out.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow([incorporation_date])
```
First step, you have to read all the URLs in your CSV
import csv
csvReader = csv.reader('text.csv')
# next(csvReader) uncomment if you have a header in the .CSV file
all_urls = [row for row in csvReader if row]
Second step, fetch the data from the URL
from urllib.request import urlopen
def get_jsonparsed_data(url):
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url_data = get_jsonparsed_data("give_your_url_here")
Third step:
Go through all URLs that you got from CSV file
Get JSON data
Fetch the field what you need, in your case "IncorporationDate"
Write into an output CSV file, I'm naming it as IncorporationDates.csv
Code below:
for each_url in all_urls:
url_data = get_jsonparsed_data(each_url)
with open('IncorporationDates.csv', 'w' ) as abc:
abc.write(url_data['primaryTopic']['IncorporationDate'])

Write JSON Data From "Requests" Python Module to CSV

JSON data output when printed in command line I am currently pulling data via an API and am attempting to write the data into a CSV in order to run calculations in SQL. I am currently able to pull the data, open the CSV, however an error occurs when the data is being written into the CSV. The error is that each individual character is separated by a comma.
I am new to working with JSON data so I am curious if I need to perform an intermediary step between pulling the JSON data and inserting it into a CSV. Any help would be greatly appreciated as I am completely stuck on this (even the data provider does not seem to know how to get around this).
Please see the code below:
import requests
import time
import pyodbc
import csv
import json
headers = {'Authorization': 'Token'}
Metric1 = ['Website1','Website2']
Metric2 = ['users','hours','responses','visits']
Metric3 = ['Country1','Country2','Country3']
obs_list = []
obs_file = r'TEST.csv'
with open(obs_file, 'w') as csvfile:
f=csv.writer(csvfile)
for elem1 in Metric1:
for elem2 in Metric2:
for elem3 in Metric3:
URL = "www.data.com"
r = requests.get(URL, headers=headers, verify=False)
for elem in r:
f.writerow(elem) `
Edit: When I print the data instead of writing it to a CSV, the data appears in the command window in the following format:
[timestamp, metric], [timestamp, metric], [timestamp, metric] ...
Timestamp = 12 digit character
Metric = decimal value

how to clean a JSON file and store it to another file in Python

I am trying to read a JSON file with Python. This file is described by the authors as not strict JSON. In order to convert it to strict JSON, they suggest this approach:
import json
def parse(path):
g = gzip.open(path, 'r')
for l in g:
yield json.dumps(eval(l))
however, not being familiar with Python, I am able to execute the script but I am not able to produce any output file with the new clean JSON. How should I modify the script in order to produce a new JSON file? I have tried this:
import json
class Amazon():
def parse(self, inpath, outpath):
g = open(inpath, 'r')
out = open(outpath, 'w')
for l in g:
yield json.dumps(eval(l), out)
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
but the output is an empty file. Any help more than welcome
import json
class Amazon():
def parse(self, inpath, outpath):
g = open(inpath, 'r')
with open(outpath, 'w') as fout:
for l in g:
fout.write(json.dumps(eval(l)))
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
another shorter way of doing this
import json
class Amazon():
def parse(readpath, writepath):
with open(readpath) as g, open(writepath, 'w') as fout:
for l in g:
json.dump(eval(l), fout)
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
While handling json data it is better to use json modules json.dump(json, output_file) for dumping json in file and json.load(file_path) to load the data. In this way you can get maintain json wile saving and reading json data.
For very large amount of data say 1k+ use python pandas module.

Categories