I am trying to bulk download movie information from The Movie Database. The preferred method mentioned on their website is to loop through movie IDs from 1 until the most recent movie ID. When I pull individual movies using their ID, I get the entire set of information. However, when I pull it into a loop, I receive an error 34, resource cannot be found. For my example, I picked specifically a movie ID that I have grabbed individual (Skyfall, 37724), which returns the resource cannot be found error.
import requests
dataset = []
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/x?api_key=*****&language=en-US'
movieurl = url[:35] + str(i) + url[36:]
payload = "{}"
response = requests.request("GET", url, data=payload)
data = response.json()
dataset.append(data)
print(movieurl)
dataset
[ANSWERED] 1) Is there a reason for why the loop cannot pull the information? Is this a programming question or specific to the API?
2) Is the way my code set up the best to pull the information and store it in bulk? My ultimate goal is to create a CSV file with the data.
Your request uses url, while your actual url is in the movieurl variable.
To write your data to csv, I would recommend the python csv DictWriter, as your data are dicts (response.json() produces a dict).
BONUS: If you want to format a string, use the string.format method:
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
this is much more robust.
The working, improved version of your code, with writing to csv would be:
import csv
import requests
with open('output.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile)
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
payload = "{}"
response = requests.request("GET", url, data=payload)
writer.writerow(response.json())
Related
I am doing this for the first time and so far have setup a simple script to fetch 2 columns of data from an APIThe data comes through and I can see it with print commandNow I am trying to write it to CSV and setup the code below which creates the file but I can't figure out how to:1. Remove the blank lines in between each data row2. Add delimiters to the data which I want to be " "3. If a value such as IP is blank then just show " "I searched and tried all sorts of examples but just getting errorsMy code snippet which writes the CSV successfully is
import requests
import csv
import json
# Make an API call and store response
url = 'https://api-url-goes-here.com'
filename = "test.csv"
headers = {
'accept': 'application/json',
}
r = requests.get(url, headers=headers, auth=('User','PWD'))
print(f"Status code: {r.status_code}")
#Store API response in a variable
response_dict = r.json()
#Open a File for Writing
f = csv.writer(open(filename, "w", encoding='utf8'))
# Write CSV Header
f.writerow(["Computer_Name", "IP_Addresses"])
for computer in response_dict["advanced_computer_search"]["computers"]:
f.writerow([computer["Computer_Name"],computer["IP_Addresses"]])
CSV output I get looks like this:
Computer_Name,IP_Addresses
HYDM002543514,
HYDM002543513,10.93.96.144 - AirPort - en1
HYDM002544581,192.168.1.8 - AirPort - en1 / 10.93.224.177 -
GlobalProtect - gpd0
HYDM002544580,10.93.80.101 - Ethernet - en0
HYDM002543515,192.168.0.6 - AirPort - en0 / 10.91.224.58 -
GlobalProtect - gpd0
CHAM002369458,10.209.5.3 - Ethernet - en0
CHAM002370188,192.168.0.148 - AirPort - en0 / 10.125.91.23 -
GlobalProtect - gpd0
MacBook-Pro,
I tried adding
csv.writer(f, delimiter =' ',quotechar =',',quoting=csv.QUOTE_MINIMAL)
after the f = csv.writer line but that creates an error:TypeError: argument 1 must have a "write" method
I am sure its something simple but just can't find the correct solution to implement in the code I have. Any help is appreciated.
Also, does the file get closed automatically? Some examples suggest to use something like f.close() but that causes errors. Do I need it? The file seems to get created fine as-is.
I suggest you use pandas package to write .csv file, which is a most used package for data analysis.
For your problem:
import requests
import csv
import json
import pandas
# Make an API call and store response
url = 'https://api-url-goes-here.com'
filename = "test.csv"
headers = {
'accept': 'application/json',
}
r = requests.get(url, headers=headers, auth=('User','PWD'))
print(f"Status code: {r.status_code}")
#Store API response in a variable
response_dict = r.json()
#collect data to build pandas.DataFrame
data = []
for computer in response_dict["advanced_computer_search"]["computers"]:
# filter blank line
if computer["Computer_Name"] or computer["IP_Addresses"]:
data.append({"Computer_Name":computer["Computer_Name"],"IP_Addresses":computer["IP_Addresses"]})
pandas.DataFrame(data=data).to_csv(filename, index=False)
if you want use " " to separate value, you can set sep=" " in the last line output the .csv file. However, I recommend to use , as delimiters due to it's a common standard. Also much more configs could be set for DataFrame.to_csv() method, you can check the official docs. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
As you said in comment, pandas is not a standard python package. You can simply open a file and write lines to that file, with the lines you build manually. For example:
import requests
import csv
import json
# Make an API call and store response
url = 'https://api-url-goes-here.com'
filename = "test.csv"
headers = {
'accept': 'application/json',
}
r = requests.get(url, headers=headers, auth=('User','PWD'))
print(f"Status code: {r.status_code}")
#Store API response in a variable
response_dict = r.json()
r = requests.get(url, headers=headers, auth=('User','PWD'))
print(f"Status code: {r.status_code}")
#Store API response in a variable
response_dict = r.json()
#Open a File for Writing
f = csv.writer(open(filename, "w", encoding='utf8'))
with open(filename, mode='w') as f:
# Write CSV Header
f.write("Computer_Name,"+"IP_Addresses"+"\n")
for computer in response_dict["advanced_computer_search"]["computers"]:
# filter blank line
if computer["Computer_Name"] or computer["IP_Addresses"]:
f.write("\""+computer["Computer_Name"]+"\","+"\""+computer["IP_Addresses"]+"\"\n")
Note that " around value was build by appending \". \n to change new line after each loop.
I have used an API using python, to call out to "NewsAPI" to get all the latest news that I need and I have actually save it into a text file called "NewsAPI.txt".
My code is:
import json
import requests
def newsAPI():
url = ('https://newsapi.org/v2/everything?' #API URL
'q=procurement AND tender&' #keywords on procurement AND tender
'sortBy=popularity&' #Sort them by popularity
'apiKey=***') #Personal API key
# GET
response = requests.get(url)
#storing the output into variable "results"
results = response.json()
# save the JSON output into a txt file for future usage
with open("NewsAPI.txt", "w") as text_file:
json.dump(results, text_file)
After calling json.dump, it gets saved into the "NewsAPI.txt" file as I have mentioned. But I'm having trouble putting it into a treeview in Tkinter, or am I using the wrong widget to display them?
Output data:
I'm trying to download images from a list of URL's. Each URL contains a txt file with jpeg information. The URL's are uniform except for an incremental change in the folder number. Below are example URL's
Min: https://marco.ccr.buffalo.edu/data/train/train-00001-of-00407
Max: https://marco.ccr.buffalo.edu/data/train/train-00407-of-00407
I want to read each of these URL's and store the their output to another folder. I was looking into the requests python library to do this but Im wondering how to iterate over the URL's and essentially write my loop to increment over that number in the URL. Apologize in advance if I misuse the terminology. Thanks!
# This may be terrible starting code
# imported the requests library
import requests
url = "https://marco.ccr.buffalo.edu/data/train/train-00001-of-00407"
# URL of the image to be downloaded is defined as image_url
r = requests.get(url) # create HTTP response object
# send a HTTP request to the server and save
# the HTTP response in a response object called r
with open("data.txt",'wb') as f:
# Saving received content as a png file in
# binary format
# write the contents of the response (r.content)
# to a new file in binary mode.
f.write(r.content)
You can generate urls like this and perform get for each
for i in range(1,408):
url = "https://marco.ccr.buffalo.edu/data/train/train-" + str(i).zfill(5) + "-of-00407"
print (url)
Also use a variable in the filename to keep a different copy of each. For eg, use this
with open("data" + str(i) + ".txt",'wb') as f:
Overall code may look something like this (not exactly this)
import requests
for i in range(1,408):
url = "https://marco.ccr.buffalo.edu/data/train/train-" + str(i).zfill(5) + "-of-00407"
r = requests.get(url)
# you might have to change the extension
with open("data" + str(i).zfill(5) + ".txt",'wb') as f:
f.write(r.content)
JSON data output when printed in command line I am currently pulling data via an API and am attempting to write the data into a CSV in order to run calculations in SQL. I am currently able to pull the data, open the CSV, however an error occurs when the data is being written into the CSV. The error is that each individual character is separated by a comma.
I am new to working with JSON data so I am curious if I need to perform an intermediary step between pulling the JSON data and inserting it into a CSV. Any help would be greatly appreciated as I am completely stuck on this (even the data provider does not seem to know how to get around this).
Please see the code below:
import requests
import time
import pyodbc
import csv
import json
headers = {'Authorization': 'Token'}
Metric1 = ['Website1','Website2']
Metric2 = ['users','hours','responses','visits']
Metric3 = ['Country1','Country2','Country3']
obs_list = []
obs_file = r'TEST.csv'
with open(obs_file, 'w') as csvfile:
f=csv.writer(csvfile)
for elem1 in Metric1:
for elem2 in Metric2:
for elem3 in Metric3:
URL = "www.data.com"
r = requests.get(URL, headers=headers, verify=False)
for elem in r:
f.writerow(elem) `
Edit: When I print the data instead of writing it to a CSV, the data appears in the command window in the following format:
[timestamp, metric], [timestamp, metric], [timestamp, metric] ...
Timestamp = 12 digit character
Metric = decimal value
I am trying to write a script (Python 2.7.11, Windows 10) to collect data from an API and append it to a csv file.
The API I want to use returns data in json.
It limits the # of displayed records though, and pages them.
So there is a max number of records you can get with a single query, and then you have to run another query, changing the page number.
The API informs you about the nr of pages a dataset is divided to.
Let's assume that the max # of records per page is 100 and the nr of pages is 2.
My script:
import json
import urllib2
import csv
url = "https://some_api_address?page="
limit = "&limit=100"
myfile = open('C:\Python27\myscripts\somefile.csv', 'ab')
def api_iterate():
for i in xrange(1, 2, 1):
parse_url = url,(i),limit
json_page = urllib2.urlopen(parse_url)
data = json.load(json_page)
for item in data['someobject']:
print item ['some_item1'], ['some_item2'], ['some_item3']
f = csv.writer(myfile)
for row in data:
f.writerow([str(row)])
This does not seem to work, i.e. it creates a csv file, but the file is not populated. There is obviously something wrong with either the part of the script which builds the address for the query OR the part dealing with reading json OR the part dealing with writing query to csv. Or all of them.
I have tried using other resources and tutorials, but at some point I got stuck and I would appreciate your assistance.
The url you have given provides a link to the next page as one of the objects. You can use this to iterate automatically over all of the pages.
The script below gets each page, extracts two of the entries from the Dataobject array and writes them to an output.csv file:
import json
import urllib2
import csv
def api_iterate(myfile):
url = "https://api-v3.mojepanstwo.pl/dane/krs_osoby"
csv_myfile = csv.writer(myfile)
cols = ['id', 'url']
csv_myfile.writerow(cols) # Write a header
while True:
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data['Dataobject']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['Links']['next'] # Get the next url
except KeyError as e:
break
with open(r'e:\python temp\output.csv', 'wb') as myfile:
api_iterate(myfile)
This will give you an output file looking something like:
id,url
1347854,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1347854
1296239,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1296239
705217,https://api-v3.mojepanstwo.pl/dane/krs_osoby/705217
802970,https://api-v3.mojepanstwo.pl/dane/krs_osoby/802970