JSON data output when printed in command line I am currently pulling data via an API and am attempting to write the data into a CSV in order to run calculations in SQL. I am currently able to pull the data, open the CSV, however an error occurs when the data is being written into the CSV. The error is that each individual character is separated by a comma.
I am new to working with JSON data so I am curious if I need to perform an intermediary step between pulling the JSON data and inserting it into a CSV. Any help would be greatly appreciated as I am completely stuck on this (even the data provider does not seem to know how to get around this).
Please see the code below:
import requests
import time
import pyodbc
import csv
import json
headers = {'Authorization': 'Token'}
Metric1 = ['Website1','Website2']
Metric2 = ['users','hours','responses','visits']
Metric3 = ['Country1','Country2','Country3']
obs_list = []
obs_file = r'TEST.csv'
with open(obs_file, 'w') as csvfile:
f=csv.writer(csvfile)
for elem1 in Metric1:
for elem2 in Metric2:
for elem3 in Metric3:
URL = "www.data.com"
r = requests.get(URL, headers=headers, verify=False)
for elem in r:
f.writerow(elem) `
Edit: When I print the data instead of writing it to a CSV, the data appears in the command window in the following format:
[timestamp, metric], [timestamp, metric], [timestamp, metric] ...
Timestamp = 12 digit character
Metric = decimal value
Related
The goal is to open a json file or websites so that I can view earthquake data. I create a json function that use dictionary and a list but within the terminal an error appears as a invalid argument. What is the best way to open a json file using python?
import requests
`def earthquake_daily_summary():
req = requests.get("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson")
data = req.json() # The .json() function will convert the json data from the server to a dictionary
# Open json file
f = open('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson')
# returns Json oject as a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['emp_details']:
print(i)
f.close()
print("\n=========== PROBLEM 5 TESTS ===========")
earthquake_daily_summary()`
You can immediately convert the response to json and read the data you need.
I didn't find the 'emp_details' key, so I replaced it with 'features'.
import requests
def earthquake_daily_summary():
data = requests.get("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson").json()
for row in data['features']:
print(row)
print("\n=========== PROBLEM 5 TESTS ===========")
earthquake_daily_summary()
I am not an expert in Python but I used it to call data from an API. I got a code 200 and printed part of the data but I am not able to export/ save/ write the output (to CSV). Can anyone assist?
This is my code:
import requests
headers = {
'Accept': 'text/csv',
'Authorization': 'Bearer ...'
}
response = requests.get('https://feeds.preqin.com/api/investor/pe', headers=headers)
print response
output = response.content
And here is how the data (should be CSV, correct?) looks like:
enter image description here
I managed to save it as txt but the output is not usable/ importable (e.g. to Excel). I used the following code:
text_file = open("output.txt", "w")
n = text_file.write(output)
text_file.close()
Thank you and best regards,
A
Your content uses pipes | as a separator. CSVs use , commas (that's why they're called Comma Separated Values).
You can simply replace your data's pipes with commas. However, this may be problematic if the data you have itself uses commas.
output = response.content.replace("|", ",")
As comments have suggested, you could also use pandas:
import pandas as pd
from StringIO import StringIO
# Get your output normally...
output = response.content
df = pd.read_csv(StringIO(output), sep="|")
# Saving to .CSV
df.to_csv(r"C:\output.csv")
# Saving to .XLSX
df.to_excel(r"C:\output.xlsx")
I am trying to bulk download movie information from The Movie Database. The preferred method mentioned on their website is to loop through movie IDs from 1 until the most recent movie ID. When I pull individual movies using their ID, I get the entire set of information. However, when I pull it into a loop, I receive an error 34, resource cannot be found. For my example, I picked specifically a movie ID that I have grabbed individual (Skyfall, 37724), which returns the resource cannot be found error.
import requests
dataset = []
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/x?api_key=*****&language=en-US'
movieurl = url[:35] + str(i) + url[36:]
payload = "{}"
response = requests.request("GET", url, data=payload)
data = response.json()
dataset.append(data)
print(movieurl)
dataset
[ANSWERED] 1) Is there a reason for why the loop cannot pull the information? Is this a programming question or specific to the API?
2) Is the way my code set up the best to pull the information and store it in bulk? My ultimate goal is to create a CSV file with the data.
Your request uses url, while your actual url is in the movieurl variable.
To write your data to csv, I would recommend the python csv DictWriter, as your data are dicts (response.json() produces a dict).
BONUS: If you want to format a string, use the string.format method:
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
this is much more robust.
The working, improved version of your code, with writing to csv would be:
import csv
import requests
with open('output.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile)
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
payload = "{}"
response = requests.request("GET", url, data=payload)
writer.writerow(response.json())
I am trying to write a script (Python 2.7.11, Windows 10) to collect data from an API and append it to a csv file.
The API I want to use returns data in json.
It limits the # of displayed records though, and pages them.
So there is a max number of records you can get with a single query, and then you have to run another query, changing the page number.
The API informs you about the nr of pages a dataset is divided to.
Let's assume that the max # of records per page is 100 and the nr of pages is 2.
My script:
import json
import urllib2
import csv
url = "https://some_api_address?page="
limit = "&limit=100"
myfile = open('C:\Python27\myscripts\somefile.csv', 'ab')
def api_iterate():
for i in xrange(1, 2, 1):
parse_url = url,(i),limit
json_page = urllib2.urlopen(parse_url)
data = json.load(json_page)
for item in data['someobject']:
print item ['some_item1'], ['some_item2'], ['some_item3']
f = csv.writer(myfile)
for row in data:
f.writerow([str(row)])
This does not seem to work, i.e. it creates a csv file, but the file is not populated. There is obviously something wrong with either the part of the script which builds the address for the query OR the part dealing with reading json OR the part dealing with writing query to csv. Or all of them.
I have tried using other resources and tutorials, but at some point I got stuck and I would appreciate your assistance.
The url you have given provides a link to the next page as one of the objects. You can use this to iterate automatically over all of the pages.
The script below gets each page, extracts two of the entries from the Dataobject array and writes them to an output.csv file:
import json
import urllib2
import csv
def api_iterate(myfile):
url = "https://api-v3.mojepanstwo.pl/dane/krs_osoby"
csv_myfile = csv.writer(myfile)
cols = ['id', 'url']
csv_myfile.writerow(cols) # Write a header
while True:
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data['Dataobject']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['Links']['next'] # Get the next url
except KeyError as e:
break
with open(r'e:\python temp\output.csv', 'wb') as myfile:
api_iterate(myfile)
This will give you an output file looking something like:
id,url
1347854,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1347854
1296239,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1296239
705217,https://api-v3.mojepanstwo.pl/dane/krs_osoby/705217
802970,https://api-v3.mojepanstwo.pl/dane/krs_osoby/802970
I'm currently using Yahoo Pipes which provides me with a JSON file from an URL.
I would like to be able to fetch it and convert it into a CSV file, and I have no idea where to begin (I'm a complete beginner in Python).
How can I fetch the JSON data from the URL?
How can I transform it to CSV?
Thank you
import urllib2
import json
import csv
def getRows(data):
# ?? this totally depends on what's in your data
return []
url = "http://www.yahoo.com/something"
data = urllib2.urlopen(url).read()
data = json.loads(data)
fname = "mydata.csv"
with open(fname,'wb') as outf:
outcsv = csv.writer(outf)
outcsv.writerows(getRows(data))