CSV or JSON format - python

import base64
import requests
USERNAME, PASSWORD = 'notworking', 'notworking'
def send_request():
# Request
try:
response = requests.get(
url="https://api.mysportsfeeds.com/v1.1/pull/nhl/2017-2018-regular/cumulative_player_stats.{format}",
params={
"fordate": "20171009"
},
headers={
"Authorization": "Basic " +
base64.b64encode('{}:{}'.format(USERNAME,PASSWORD)\
.encode('utf-8')).decode('ascii')
}
)
print('Response HTTP Status Code: {status_code}'.format(
status_code=response.status_code))
print('Response HTTP Response Body: {content}'.format(
content=response.content))
except requests.exceptions.RequestException:
print('HTTP Request failed')
That code allows me to pull data from mysportsfeeds.com. Eventually, I will need to take the output of send_request function and format it in a .xlsx file with openpyxl library. I don't know which format will be the most easier to treat, i.e. the output with csv or with json format.
That excellent website will show you how to get the output of cumulate_player_stats.
For instance,
https://api.mysportsfeeds.com/v1.1/pull/nhl/2016-2017-regular/cumulative_player_stats.{format}
where {format} is either csv or json
Questions :
What is the better choice : the output in csv format ou json format so that it will work well with openpyxl lib? Will anyone be able to show me how it could work with csv (with csv library) and json (with json library) in using openpyxl?

Excel is a row-based file format. This would suggest CSV, which is row based. But CSV files are text-only, meaning that they contain no type information and you have to guess whether "9/10/17" means the 9th of October (20)17, the 10th of September (20)17, simply "9/10/17".
JSON is at least types but will need to be read into memory all at once. Assuming that it merely a list of list then it would probably be the best option here because Excel worksheets cannot have more than a million rows.

Related

Exporting API output response to CSV - Python

I am not an expert in Python but I used it to call data from an API. I got a code 200 and printed part of the data but I am not able to export/ save/ write the output (to CSV). Can anyone assist?
This is my code:
import requests
headers = {
'Accept': 'text/csv',
'Authorization': 'Bearer ...'
}
response = requests.get('https://feeds.preqin.com/api/investor/pe', headers=headers)
print response
output = response.content
And here is how the data (should be CSV, correct?) looks like:
enter image description here
I managed to save it as txt but the output is not usable/ importable (e.g. to Excel). I used the following code:
text_file = open("output.txt", "w")
n = text_file.write(output)
text_file.close()
Thank you and best regards,
A
Your content uses pipes | as a separator. CSVs use , commas (that's why they're called Comma Separated Values).
You can simply replace your data's pipes with commas. However, this may be problematic if the data you have itself uses commas.
output = response.content.replace("|", ",")
As comments have suggested, you could also use pandas:
import pandas as pd
from StringIO import StringIO
# Get your output normally...
output = response.content
df = pd.read_csv(StringIO(output), sep="|")
# Saving to .CSV
df.to_csv(r"C:\output.csv")
# Saving to .XLSX
df.to_excel(r"C:\output.xlsx")

Read a specific value from a json file with Python

I am getting a JSON file from a curl request and I want to read a specific value from it.
Suppose that I have a JSON file, like the following one. How can I insert the "result_count" value into a variable?
Currently, after getting the response from curl, I am writing the JSON objects into a txt file like this.
json_response = connect_to_endpoint(url, headers)
f.write(json.dumps(json_response, indent=4, sort_keys=True))
Your json_response isn't a JSON content (JSON is a formatted string), but a python dict, you can access it using the keys
res_count = json_response['meta']['result_count']
Use the json module from the python standard library.
data itself is just a python dictionary, and can be accessed as such.
import json
with open('path/to/file/filename.json') as f:
data = json.load(f)
result_count = data['meta']['result_count']
you can parse a JSON string using json.loads() method in json module.
response = connect_to_endpoint(url, headers)
json_response = json.load(response)
after that you can extract an element with specify element name in Brackets
result_count = ['meta']['result_count']

Uploading a csv type data using python request.put without reading from a saved csv file?

i have an api end point where i am uploading data to using python. end point accepts
putHeaders = {
'Authorization': user,
'Content-Type': 'application/octet-stream' }
My current code is doing this
.Save a dictionary as csv file
.Encode csv to utf8
dataFile = open(fileData['name'], 'r').read()).encode('utf-8')
.Upload file to api end point
fileUpload = requests.put(url,
headers=putHeaders,
data=(dataFile))
What i am trying to acheive is
loading the data without saving
so far i tried
converting my dictionary to bytes using
data = json.dumps(payload).encode('utf-8')
and loading to api end point . This works but the output in api end point is not correct.
Question
Does anyone know how to upload csv type data without actually saving the file ?
EDIT: use io.StringIO() as your file-like object when your writing your dict to csv. Then call get_value() and pass that as your data param to requests.put().
See this question for more details: How do I write data into CSV format as string (not file)?.
Old answer:
If your dict is this:
my_dict = {'col1': 1, 'col2': 2}
then you could convert it to a csv format like so:
csv_data = ','.join(list(my_dict.keys()))
csv_data += ','.join(list(my_dict.values()))
csv_data = csv_data.encode('utf8')
And then do your requests.put() call with data=csv_data.
Updated answer
I hadn't realized your input was a dictionary, you had mentioned the dictionary was being saved as a file. I assumed the dictionary lookup in your code was referencing a file. More work needs to be done if you want to go from a dict to a CSV file-like object.
Based on the I/O from your question, it appears that your input dictionary has this structure:
file_data = {"name": {"Col1": 1, "Col2": 2}}
Given that, I'd suggest trying the following using csv and io:
import csv
import io
import requests
session = requests.Session()
session.headers.update(
{"Authorization": user, "Content-Type": "application/octet-stream"}
)
file_data = {"name": {"Col1": 1, "Col2": 2}}
with io.StringIO() as f:
name = file_data["name"]
writer = csv.DictWriter(f, fieldnames=name)
writer.writeheader()
writer.writerows([name]) # `data` is dict but DictWriter expects list of dicts
response = session.put(url, data=f)
You may want to test using the correct MIME type passed in the request header. While the endpoint may not care, it's best practice to use the correct type for the data. CSV should be text/csv. Python also provides a MIME types module:
>>> import mimetypes
>>>
>>> mimetypes.types_map[".csv"]
'text/csv'
Original answer
Just open the file in bytes mode and rather than worrying about encoding or reading into memory.
Additionally, use a context manager to handle the file rather than assigning to a variable, and pass your header to a Session object so you don't have to repeatedly pass header data in your request calls.
Documentation on the PUT method:
https://requests.readthedocs.io/en/master/api/#requests.put
data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request.
import requests
session = requests.Session()
session.headers.update(
{"Authorization": user, "Content-Type": "application/octet-stream"}
)
with open(file_data["name"], "rb") as f:
response = session.put(url, data=f)
Note: I modified your code to more closely follow python style guides.

Looping through IDs for with API

I am trying to bulk download movie information from The Movie Database. The preferred method mentioned on their website is to loop through movie IDs from 1 until the most recent movie ID. When I pull individual movies using their ID, I get the entire set of information. However, when I pull it into a loop, I receive an error 34, resource cannot be found. For my example, I picked specifically a movie ID that I have grabbed individual (Skyfall, 37724), which returns the resource cannot be found error.
import requests
dataset = []
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/x?api_key=*****&language=en-US'
movieurl = url[:35] + str(i) + url[36:]
payload = "{}"
response = requests.request("GET", url, data=payload)
data = response.json()
dataset.append(data)
print(movieurl)
dataset
[ANSWERED] 1) Is there a reason for why the loop cannot pull the information? Is this a programming question or specific to the API?
2) Is the way my code set up the best to pull the information and store it in bulk? My ultimate goal is to create a CSV file with the data.
Your request uses url, while your actual url is in the movieurl variable.
To write your data to csv, I would recommend the python csv DictWriter, as your data are dicts (response.json() produces a dict).
BONUS: If you want to format a string, use the string.format method:
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
this is much more robust.
The working, improved version of your code, with writing to csv would be:
import csv
import requests
with open('output.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile)
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
payload = "{}"
response = requests.request("GET", url, data=payload)
writer.writerow(response.json())

Write JSON Data From "Requests" Python Module to CSV

JSON data output when printed in command line I am currently pulling data via an API and am attempting to write the data into a CSV in order to run calculations in SQL. I am currently able to pull the data, open the CSV, however an error occurs when the data is being written into the CSV. The error is that each individual character is separated by a comma.
I am new to working with JSON data so I am curious if I need to perform an intermediary step between pulling the JSON data and inserting it into a CSV. Any help would be greatly appreciated as I am completely stuck on this (even the data provider does not seem to know how to get around this).
Please see the code below:
import requests
import time
import pyodbc
import csv
import json
headers = {'Authorization': 'Token'}
Metric1 = ['Website1','Website2']
Metric2 = ['users','hours','responses','visits']
Metric3 = ['Country1','Country2','Country3']
obs_list = []
obs_file = r'TEST.csv'
with open(obs_file, 'w') as csvfile:
f=csv.writer(csvfile)
for elem1 in Metric1:
for elem2 in Metric2:
for elem3 in Metric3:
URL = "www.data.com"
r = requests.get(URL, headers=headers, verify=False)
for elem in r:
f.writerow(elem) `
Edit: When I print the data instead of writing it to a CSV, the data appears in the command window in the following format:
[timestamp, metric], [timestamp, metric], [timestamp, metric] ...
Timestamp = 12 digit character
Metric = decimal value

Categories