I have about 200 thousand imdb_id in a file, and want to get JSON information from these imdb_id using omdb API.
I wrote this code and it works correctly, but it's very slow (3 Seconds for each id, it would take 166 Hours):
import urllib.request
import csv
import datetime
from collections import defaultdict
i = 0
columns = defaultdict(list)
with open('a.csv', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
with open('a.csv', 'r', encoding='utf-8') as csvinput:
with open('b.csv', 'w', encoding='utf-8', newline='') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[0] == "item_id":
writer.writerow(row + ["movie_info"])
else:
url = urllib.request.urlopen(
"http://www.omdbapi.com/?i=tt" + str(columns['item_id'][i]) + "&apikey=??????").read()
url = url.decode('utf-8')
writer.writerow((row + [url]))
i = i + 1
Whats the fastest way to get movie info from omdb with python ???
**Edited : I wrote this code and after get 1022 url resopnse i hava this error :
import grequests
urls = open("a.csv").readlines()
api_key = '??????'
def exception_handler(request, exception):
print("Request failed")
# read file and put each lines to an LIST
for i in range(len(urls)):
urls[i] = "http://www.omdbapi.com/?i=tt" + str(urls[i]).rstrip('\n') + "&apikey=" + api_key
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests, exception_handler=exception_handler)
with open('b.json', 'wb') as outfile:
for response in responses:
outfile.write(response.content)
Error is :
Traceback (most recent call last):
File "C:/python_apps/omdb_async.py", line 18, in <module>
outfile.write(response.content)
AttributeError: 'NoneType' object has no attribute 'content'
How can i solve this error ???
This code is IO bound and would benefit greatly from using Python's async/await capabilities. You can loop over your collection of URLs, creating an asynchronously executing request for each, much like the example in this SO question.
Once you're making these requests asynchronously, you may need to throttle your request rate to something within the OMDB API limit.
Related
I'm a python beginner and I'm pulling data from this URL: https://api.openweathermap.org/data/2.5
I'm trying to write the data I get into a csv file but the fields are all over the place (see link to image below).
This is my code:
import requests
import csv
import json
API_KEY = 'redacted'
BASE_URL = 'https://api.openweathermap.org/data/2.5/weather'
city = input('Enter a city name: ')
request_url = f"{BASE_URL}?appid={API_KEY}&q={city}"
csvheaders = ['City', 'Description', 'Temp.']
response = requests.get(request_url)
if response.status_code == 200:
data = response.json()
city = data['name']
weather = data['weather'][0]['description']
temperature = round(data['main']['temp'] - 273.15, 2)
else:
print('Error')
with open('weather_api.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(csvheaders)
writer.writerows([city, weather, temperature ])
print('done')
And the resultant csv output looks like this
Could someone tell me what I'm doing wrong and how I can get accurately pull data into the correct columns? That would be much appreciated.
If there is a much simpler way of doing this I'm all ears!
I need to post some records to the website. I feel I am done with the complex part - the code itself, now I need to tweak the code so that my account doesn't get blocked when doing the posting - yep, just happened.
#importing libraries
import csv
import json
#changing data type
field_types = [('subject', str),
('description', str),
('email', str)]
output = []
#opening the raw file
with open('file.csv','r',encoding = 'utf-8-sig') as f:
for row in csv.DictReader(f):
row.update((key, conversion(row[key]))
for key, conversion in field_types)
output.append(row) #appending rows
with open('tickets.json','w') as outfile: #saving records as json
json.dump(output,outfile,sort_keys = True, indent = 4)
with open('tickets.json','r')as infile:
indata = json.load(infile)
output =[]
for data in indata:
r= requests.post("https://"+ domain +".domain.com/api/", auth = (api_key, password), headers = headers, json=data)
output.append(json.loads(r.text))
#saving the response code
with open('response.json', 'w') as outfile:
json.dump(output, outfile, indent = 4)
I searched and found time.sleep(5) but now sure how to use it. Will it go before output.append(json.loads(r.text))?
am trying to write a loop that gets .json from an url via requests, then writes the .json to a .csv file. Then I need it to it over and over again until my list of names (.txt file) is finished(89 lines). I can't get it to go over the list, it just get the error:
AttributeError: module 'response' has no attribute 'append'
I canĀ“t find the issue, if I change 'response' to 'responses' I get also an error
with open('listan-{}.csv'.format(pricelists), 'w') as outf:
OSError: [Errno 22] Invalid argument: "listan-['A..
I can't seem to find a loop fitting for my purpose. Since I am a total beginner of python I hope I can get some help here and learn more.
My code so far.
#Opens the file with pricelists
pricelists = []
with open('prislistor.txt', 'r') as f:
for i, line in enumerate(f):
pricelists.append(line.strip())
# build responses
responses = []
for pricelist in pricelists:
response.append(requests.get('https://api.example.com/3/prices/sublist/{}/'.format(pricelist), headers=headers))
#Format each response
fullData = []
for response in responses:
parsed = json.loads(response.text)
listan=(json.dumps(parsed, indent=4, sort_keys=True))
#Converts and creates a .csv file.
fullData.append(parsed['Prices'])
with open('listan-{}.csv'.format(pricelists), 'w') as outf:
dw.writeheader()
for data in fullData:
dw = csv.DictWriter(outf, data[0].keys())
for row in data:
dw.writerow(row)
print ("The file list-{}.csv is created!".format(pricelists))
Can you make the below changes in the place where you are making the api call(import json library as well) and see?
import json
responses = []
for pricelist in pricelists:
response = requests.get('https://api.example.com/3/prices/sublist/{}/'.format(pricelist), headers=headers)
response_json = json.loads(response.text)
responses.append(response_json)
and the below code also should be in a loop which loops through items in pricelists
for pricelist in pricelists:
with open('listan-{}.csv'.format(pricelists), 'w') as outf:
dw.writeheader()
for data in fullData:
dw = csv.DictWriter(outf, data[0].keys())
for row in data:
dw.writerow(row)
print ("The file list-{}.csv is created!".format(pricelists))
Finally got it working. Got a help from another questions I created here at the forum. #waynelpu
The misstake I did was to not put the code into a loop.
Here is the code that worked like a charm.
pricelists = []
with open('prislistor.txt', 'r') as f:
for i, line in enumerate(f): # from here on, a looping code block start with 8 spaces
pricelists = (line.strip())
# Keeps the indents
response = requests.get('https://api.example.se/3/prices/sublist/{}/'.format(pricelists), headers=headers)
#Formats it
parsed = json.loads(response.text)
listan=(json.dumps(parsed, indent=4, sort_keys=True))
#Converts and creates a .csv file.
data = parsed['Prices']
with open('listan-{}.csv'.format(pricelists), 'w') as outf:
dw = csv.DictWriter(outf, data[0].keys())
dw.writeheader()
for row in data:
dw.writerow(row)
print ("The file list-{}.csv is created!".format(pricelists))
# codes here is outside the loop but still INSIDE the 'with' block, so you can still access f here
# codes here leaves all blocks
I am querying an API from a website. The API will be down for maintenance from time to time and also, there may not be data available for querying at times. I have written the code to keep forcing the program to query the API even after an error, however it doesn't seem to be working.
The following is the code:
import threading
import json
import urllib
from urllib.parse import urlparse
import httplib2 as http #External library
import datetime
import pyodbc as db
import os
import gzip
import csv
import shutil
def task():
#Authentication parameters
headers = { 'AccountKey' : 'secret',
'accept' : 'application/json'} #this is by default
#API parameters
uri = 'http://somewebsite.com/' #Resource URL
path = '/something/TrafficIncidents?'
#Build query string & specify type of API call
target = urlparse(uri + path)
print(target.geturl())
method = 'GET'
body = ''
#Get handle to http
h = http.Http()
#Obtain results
response, content = h.request(target.geturl(), method, body, headers)
api_call_time = datetime.datetime.now()
filename = "traffic_incidents_" + str(datetime.datetime.today().strftime('%Y-%m-%d'))
createHeader = 1
if os.path.exists(filename + '.csv'):
csvFile = open(filename + '.csv', 'a')
createHeader = 0
else:
#compress previous day's file
prev_filename = "traffic_incidents_" + (datetime.datetime.today()-datetime.timedelta(days=1)).strftime('%Y-%m-%d')
if os.path.exists(prev_filename + '.csv'):
with open(prev_filename + '.csv' , 'rb') as f_in, gzip.open(prev_filename + '.csv.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
os.remove(prev_filename + '.csv')
#create new csv file for writing
csvFile = open(filename + '.csv', 'w')
#Parse JSON to print
jsonObj = json.loads(content)
print (json.dumps(jsonObj, sort_keys=True, indent=4))
with open("traffic_incidents.json","w") as outfile:
#Saving jsonObj["d"]
json.dump(jsonObj, outfile, sort_keys=True, indent=4,ensure_ascii=False)
for i in range(len(jsonObj["value"])):
jsonObj["value"][i]["IncidentTime"] = jsonObj["value"][i]["Message"].split(' ',1)[0]
jsonObj["value"][i]["Message"] = jsonObj["value"][i]["Message"].split(' ',1)[1]
jsonObj["value"][i]["ApiCallTime"] = api_call_time
#Save to csv file
header = jsonObj["value"][0].keys()
csvwriter = csv.writer(csvFile,lineterminator='\n')
if createHeader == 1:
csvwriter.writerow(header)
for i in range(len(jsonObj["value"])):
csvwriter.writerow(jsonObj["value"][i].values())
csvFile.close()
t = threading.Timer(120,task)
t.start()
while True:
try:
task()
except IndexError:
pass
else:
break
I get the following error and the program stops:
"header = jsonObj["value"][0].keys()
IndexError: list index out of range"
I would like the program to keep running even after the IndexError has occured.
How can I edit the code to achieve that?
I've built a script that crawls court listings in the UK, generates a list of links to each court's address page, and then want to scrape the address from said page.
It works pretty well so far but I am stuck at the "write to csv" bit. I think it's got to do with the iteritems()'s lack of get method, based on a similar problem. I get that an iterator doesn't have the same methods as an iterable (I am using an iterator in my code), but it didn't help me solve my particular problem.
Here's my code:
import csv
import time
import random
import requests
from bs4 import BeautifulSoup as bs
# lambda expression to request url and parse it through bs
soup = lambda url: bs((requests.get(url)).text, "html.parser")
def crawl_court_listings(base, buff, char):
""" """
# common URL segment + cuffer URL segment + end character -> URL
url = base + buff + str(chr(char))
# soup lambda expression -> grab first unordered list
links = (soup(url)).find('div', {'class', 'content inner cf'}).find('ul')
# empty dictionary
results = {}
# loop through links, get link title and href
for item in links.find_all('a', href=True):
court_link = item['href']
title = item.string
# generate full court address page url from href
full_court_link = base + court_link
# save title and full URL to results
results[title] = full_court_link
# increment char var by 1
char += 1
# return results dict and incremented char value
return results, char
def get_court_address(court_name, full_court_link):
""" """
# get horrible chunk of poorly formatted address(es)
address_blob = (soup(full_court_link)).find('div', {'id': 'addresses'}).text
# clean the blob
clean_address = ("\n".join(line.strip() for line in address_blob.split("\n")))
# write to csv
with open('court_addresses.csv', 'w') as csvfile:
fieldnames = [court_name, full_court_link, clean_address]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow(fieldnames)
if __name__ == "__main__":
base = 'https://courttribunalfinder.service.gov.uk/'
buff = 'courts/'
# 65 = "A". Starting from Char "A", retrieve list of Titles and Links of for Court Addresses. Return Char +1
results, char = crawl_court_listings(base, buff, 65)
# 90 = "Z". Until Z, pass title and list from results into get_court_address(), then wait a few seconds
while char <= 90:
for t, l in results.iteritems():
get_court_address(t, l)
time.sleep(random.randint(0,5))
When I run this, I get the following:
Traceback (most recent call last):
File ".\CourtScraper.py", line 63, in <module>
get_court_address(t, l)
File ".\CourtScraper.py", line 49, in get_court_address
writer.writerow(fieldnames)
File "c:\python27\Lib\csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "c:\python27\Lib\csv.py", line 149, in _dict_to_list
return [rowdict.get(key, self.restval) for key in self.fieldnames]
AttributeError: 'list' object has no attribute 'get'
Even though I get an error, it produces the csv file with cells A1 and A2 populated with title and full-court_link, but no address. The address (when printed) looks like this:
Write to us:
1st Floor
Piccadilly Exchange
Piccadilly Plaza
Manchester
Greater Manchester
M1 4AH
So my first thoughts were that I was trying to write multi-line text into a single cell which was causing the error, but not really sure how to confirm that. I used print(type(address)) which came back as unicode and not a list, so I don't think that's causing the issue. I don't understand where it's getting the list the issue relates to from, if that makes sense.
If it is the iteritems() method causing the issue, how do I go about resolving it?
Can someone explain the error and point me in the direction of solving it please?
Your problem is here:
writer.writerow(fieldnames)
"fieldnames" is a list of field names. You need to pass a dict of key-value pairs. So it should look more like this:
# write to csv
with open('court_addresses.csv', 'w') as csvfile:
# note - these are strings, not variables
fieldnames = ['court_name', 'full_court_link', 'clean_address']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({"court_name" : court_name,
"full_court_link" : full_court_link},
"clean_address" : clean_address})
PSST: you have another issue. You are re-opening your output file for every court that you parse. You probably want to open that file once (under __main__) and then pass the handle into get_court_address()
For each row you are writing, you need to pass in a dictionary - you are passing in the header list
https://docs.python.org/2/library/csv.html#csv.DictWriter
# write to csv
with open('court_addresses.csv', 'w') as csvfile:
fieldnames = [court_name, full_court_link, clean_address]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow(fieldnames)
^^^^^^^^^^^ This should be a dict
The dict needs to look like::
{'court_name': X, 'full_court_link': Y, 'clean_address': Z}
HTH
with open('court_addresses.csv', 'w') as csvfile:
fieldnames = ['court_name', 'full_court_link', 'clean_address']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'court_name': court_name, 'full_court_link': full_court_link, 'clean_address': clean_address})