Very new to Python and haven't found specific answer on SO but apologies in advance if this appears very naive or elsewhere already.
I am trying to print 'IncorporationDate' JSON data from multiple urls of public data set. I have the urls saved as a csv file, snippet below. I am only getting as far as printing ALL the JSON data from one url, and I am uncertain how to run that over all of the csv urls, and write to csv just the IncorporationDate values.
Any basic guidance or edits are really welcomed!
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen
import json
def get_jsonparsed_data(url):
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url = ("http://data.companieshouse.gov.uk/doc/company/01046514.json")
print(get_jsonparsed_data(url))
import csv
with open('test.csv') as f:
lis=[line.split() for line in f]
for i,x in enumerate(lis):
print ()
import StringIO
s = StringIO.StringIO()
with open('example.csv', 'w') as f:
for line in s:
f.write(line)
Snippet of csv:
http://business.data.gov.uk/id/company/01046514.json
http://business.data.gov.uk/id/company/01751318.json
http://business.data.gov.uk/id/company/03164710.json
http://business.data.gov.uk/id/company/04403406.json
http://business.data.gov.uk/id/company/04405987.json
Welcome to the Python world.
For dealing with making http requests, we commonly use requests because it's dead simple api.
The code snippet below does what I believe you want:
It grabs the data from each of the urls you posted
It creates a new CSV file with each of the IncorporationDate keys.
```
import csv
import requests
COMPANY_URLS = [
'http://business.data.gov.uk/id/company/01046514.json',
'http://business.data.gov.uk/id/company/01751318.json',
'http://business.data.gov.uk/id/company/03164710.json',
'http://business.data.gov.uk/id/company/04403406.json',
'http://business.data.gov.uk/id/company/04405987.json',
]
def get_company_data():
for url in COMPANY_URLS:
res = requests.get(url)
if res.status_code == 200:
yield res.json()
if __name__ == '__main__':
for data in get_company_data():
try:
incorporation_date = data['primaryTopic']['IncorporationDate']
except KeyError:
continue
else:
with open('out.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow([incorporation_date])
```
First step, you have to read all the URLs in your CSV
import csv
csvReader = csv.reader('text.csv')
# next(csvReader) uncomment if you have a header in the .CSV file
all_urls = [row for row in csvReader if row]
Second step, fetch the data from the URL
from urllib.request import urlopen
def get_jsonparsed_data(url):
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url_data = get_jsonparsed_data("give_your_url_here")
Third step:
Go through all URLs that you got from CSV file
Get JSON data
Fetch the field what you need, in your case "IncorporationDate"
Write into an output CSV file, I'm naming it as IncorporationDates.csv
Code below:
for each_url in all_urls:
url_data = get_jsonparsed_data(each_url)
with open('IncorporationDates.csv', 'w' ) as abc:
abc.write(url_data['primaryTopic']['IncorporationDate'])
Related
import json
import requests
def download_file(url):
r = requests.get(url)
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(r.content)
api_url = 'https://api.fda.gov/download.json'
r = requests.get(api_url)
files = [file['file'] for file in json.loads(r.text)['results']['drug']['event']['partitions']]
count = 1
for file in files:
download_file(file)
print(f"{count}/{len(files)} downloaded!")
count += 1
This is the other code
import urllib.request, json
with urllib.request.urlopen("https://api.fda.gov/drug/label.json") as url:
data = json.loads(url.read().decode())
print(data)
The first code just downloads it. I wondering if theres a way to not have to download any of the 1000+ files and just display it, so the code can be used locally. While the second one prints the json in the terminal.
requests.get() and urllib.request.urlopen() both "download" the full response of the URL they are given.
If you do not want to "save" the file to disk, then remove the code that calls f.write()
More specifically,
import json
import requests
api_url = 'https://api.fda.gov/download.json'
r = requests.get(api_url)
files = [file['file'] for file in r.json()['results']['drug']['event']['partitions']]
total_files = len(files)
count = 0
for file in files:
print(requests.get(file).content)
print(f"{count+1}/{total_files} downloaded!")
count += 1
from pip._vendor import requests
import csv
url = 'https://docs.google.com/spreadsheets/abcd'
dataReader = csv.reader(open(url), delimiter=',', quotechar='"')
exampleData = list(dataReader)
exampleData
Use Python Requests.
import requests
r = requests.get(url)
lines = r.text.splitlines()
We use splitlines to turn the text into an iterable like a file handle. You should probably wrap it up in a try, catch block in case of errors.
You need to use something like urllib2 to retrieve the file.
for example:
import urllib2
import csv
csvfile = urllib2.urlopen('https://docs.google.com/spreadsheets/abcd')
dataReader = csv.reader(csvfile,delimiter=',', quotechar='"')
do_stuff(dataReader)
You can import urllib.request and then simply call data_stream = urllib.request.urlopen(url) to get a buffer of the file. You can then save the csv data as data = str(data_stream.read(), which may be a bit unclean depending on your source or encoded, so you may need to do some manipulation, but if not then you can just throw it into csv.reader(data, delimiter=',')
An example requiring translating from byte format that may work for you:
data = urllib.request.urlopen(url)
data_csv = str(data.read())
# split out b' flag from string, then also split at newlines up to the last one
dataReader = csv.reader(data_csv.split("b\'",1)[1].split("\\n")[:-1], delimiter=",")
headers = reader.__next__()
exampleData = list(dataReader)
#!/usr/bin/env python
import requests
import csv
import json
import sys
s = requests.Session()
r = s.get('https://onevideo.aol.com/sessions/login?un=username&pw=password')
r.status_code
if r.status_code == 200:
print("Logged in successfully")
else:
print("Check username and password")
filename = open('outputfile3.csv', 'w')
sys.stdout = filename
data = s.get('https://onevideo.aol.com/reporting/run_existing_report?report_id=102636').json()
json_input = json.load(data)
for entry in json_input:
print(entry)
Your assignment of sys.stdout = filename is not idiomatic, so many people may not even understand exactly what you are doing. The key misunderstanding you appear to have is that Python will interpret either the fact that you have imported csv or the extension of the file you have opened and automatically write valid lines to the file given a list of dictionaries (which is what the .json gets parsed as).
I will present a full example of how to write dictionary-like data with some contrived json for reproducability:
jsonstr = """
[{"field1": "property11", "field2": "property12"},
{"field1": "property21", "field2": "property22"},
{"field1": "property31", "field2": "property32"}]
"""
First, using only the standard library:
import json
import csv
data = json.loads(jsonstr)
with open('outputfile3.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=['field1', 'field2'])
writer.writeheader()
writer.writerows(data)
Then much more succinctly using pandas:
import pandas
pandas.read_json(jsonstr).to_csv('outputfile3.csv', index=False)
I have this script which abstract the json objects from the webpage. The json objects are converted into dictionary. Now I need to write those dictionaries in a file. Here's my code:
#!/usr/bin/python
import requests
r = requests.get('https://github.com/timeline.json')
for item in r.json or []:
print item['repository']['name']
There are ten lines in a file. I need to write the dictionary in that file which consist of ten lines..How do I do that? Thanks.
To address the original question, something like:
with open("pathtomyfile", "w") as f:
for item in r.json or []:
try:
f.write(item['repository']['name'] + "\n")
except KeyError: # you might have to adjust what you are writing accordingly
pass # or sth ..
note that not every item will be a repository, there are also gist events (etc?).
Better, would be to just save the json to file.
#!/usr/bin/python
import json
import requests
r = requests.get('https://github.com/timeline.json')
with open("yourfilepath.json", "w") as f:
f.write(json.dumps(r.json))
then, you can open it:
with open("yourfilepath.json", "r") as f:
obj = json.loads(f.read())
I'm currently using Yahoo Pipes which provides me with a JSON file from an URL.
I would like to be able to fetch it and convert it into a CSV file, and I have no idea where to begin (I'm a complete beginner in Python).
How can I fetch the JSON data from the URL?
How can I transform it to CSV?
Thank you
import urllib2
import json
import csv
def getRows(data):
# ?? this totally depends on what's in your data
return []
url = "http://www.yahoo.com/something"
data = urllib2.urlopen(url).read()
data = json.loads(data)
fname = "mydata.csv"
with open(fname,'wb') as outf:
outcsv = csv.writer(outf)
outcsv.writerows(getRows(data))