I am not a python geek but have tried to solve this problem using information from several answers to similar questions but none seem to really work in my case. Here it is:
I am calling a function from a python script:
Here is the function:
def getsom(X):
#some codes
try:
st = get data from site 1 using X
except:
print "not available from site 1, getting from site 2"
st = get data from site 2 using X
#some codes that depend on st
I am calling this from a python script as such:
#some codes
for yr in range(min_yr,max_yr+1):
day=1
while day<max_day:
st1 = getsom(X)
#some code that depends on st1
day+=1
This works fine when data is available on either site 1 or 2 for a particular day, but breaks down when it is unavailable on both sites for another day.
I want to be able to check for the next day if data is unavailable for a particular day for both sites. I have tried different configurations of try and except with no success and would appreciate any help on the most efficient way to do this.
Thanks!
***Edits
Final version that worked:
in the function part:
def getsom(X):
#some codes
try:
st = get data from site 1 using X
except:
print "not available from site 1, getting from site 2"
st = get data from site 2 using X
try:
st = get data from site 2 using X
except:
print "data not available from sites 1 and 2"
st=None
if st is not None:
#some codes that depend on st
In order to iterate to the next day on the script side, I had to handle the none case from the function with another try/except block:
#some codes
for yr in range(min_yr,max_yr+1):
day=1
while day<max_day:
try:
st=getsom(X)
except:
st=None
if st is not None:
#some codes that depend
As mentioned in the comments you seem like you want to catch the exception in first-level exception handler. You can do it like this:
def getsom(X):
#some codes
try:
st = get data from site 1 using X
except:
print "not available from site 1, getting from site 2"
try:
st = get data from site 2 using X
except:
print "Not available from site 2 as well."
# Here you can assign some arbitrary value to your variable (like None for example) or return from function.
#some codes that depend on st
If data is not available on neither of the sites you can assign some arbitrary value to your variable st or simply return from the function.
Is this what you are looking for?
Also, you shouldn't simply write except without specifying the type of exception you expect - look here for more info: Should I always specify an exception type in `except` statements?
Edit to answer the problem in comment:
If you have no data about certain day you can just return None and handle it like this:
#some codes
for yr in range(min_yr,max_yr+1):
day=1
while day<max_day:
st1 = getsom(X)
if st1 is not None:
#some code that depends on st1
day+=1
Why don't you create a separate function for it?
def getdata(X):
for site in [site1, site2]: # possibly more
try:
return get_data_from_site_using_X()
except:
print "not available in %s" % site
print "couldn't find data anywhere"
Then getsom becomes:
def getsom(X):
#some codes
st = getdata(X)
#some codes that depend on st
Related
On Auction websites, there is a clock counting down the time remaining. I am trying to extract that piece of information (among others) to print to a csv file.
For example, I am trying to take the value after 'Time Left:' on this site: https://auctionofchampions.com/Michael_Jordan___Magic_Johnson_Signed_Lmt__Ed__Pho-LOT271177.aspx
I have tried 3 different options, without any success
1)
time = ''
try:
time = soup.find(id='tzcd').text.replace('Time Left:','')
#print("Time: ",time)
except Exception as e:
print(e)
time = ''
try:
time = soup.find(id='tzcd').text
#print("Time: ",time)
except:
pass
3
time = ''
try:
time = soup.find('div', id="BiddingTimeSection").find_next_sibling("div").text
#print("Time: ",time)
except:
pass
I am a new user of Python and don't know if it's because of the date/time structure of the pull or because of something else inherently flawed with my code.
Any help would be greatly appreciated!
That information is being pulled into page via a Javascript XHR call. You can see that by inspecting Network tab in browser's Dev tools. The following code will get you the time left in seconds:
import requests
s = requests.Session()
header = {'X-AjaxPro-Method': 'GetTimerText'}
payload = '{"inventoryId":271177}'
r = s.get('https://auctionofchampions.com/Michael_Jordan___Magic_Johnson_Signed_Lmt__Ed__Pho-LOT271177.aspx')
s.headers.update(header)
r = s.post('https://auctionofchampions.com/ajaxpro/LotDetail,App_Web_lotdetail.aspx.cdcab7d2.1voto_yr.ashx', data=payload)
print(r.json()['value']['timeLeft'])
Response:
792309
792309 seconds are a bit over 9 days. There are easy ways to return them in days/hours/minutes, if you want.
I have some experience in Python, but I have never used try & except functions to catch errors due to lack of formal training.
I am working on extracting a few articles from wikipedia. For this I have an array of titles, a few of which do not have any article or search result at the end. I would like the page retrieval function just to skip those few names and continue running the script on the rest. Reproducible code follows.
import wikipedia
# This one works.
links = ["CPython"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)
#The sequence breaks down if there is no wikipedia page.
links = ["CPython","no page"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)
The library running it uses a method like this. Normally it would be really bad practice, but since this is just for a one-off data extraction, I am willing to change the local copy of the library to get it to work. Edit I included the complete function now.
def page(title=None, pageid=None, auto_suggest=True, redirect=True, preload=False):
'''
Get a WikipediaPage object for the page with title `title` or the pageid
`pageid` (mutually exclusive).
Keyword arguments:
* title - the title of the page to load
* pageid - the numeric pageid of the page to load
* auto_suggest - let Wikipedia find a valid page title for the query
* redirect - allow redirection without raising RedirectError
* preload - load content, summary, images, references, and links during initialization
'''
if title is not None:
if auto_suggest:
results, suggestion = search(title, results=1, suggestion=True)
try:
title = suggestion or results[0]
except IndexError:
# if there is no suggestion or search results, the page doesn't exist
raise PageError(title)
return WikipediaPage(title, redirect=redirect, preload=preload)
elif pageid is not None:
return WikipediaPage(pageid=pageid, preload=preload)
else:
raise ValueError("Either a title or a pageid must be specified")
What should I do to retreive only the pages that do not give the error. Maybe there is a way to filter out all items in the list that give this error or an error of some kind. Returning "NA" or similar would be fine with pages that don't exist. Skipping them without notice would be fine too. Thanks!
The function wikipedia.page will raise a wikipedia.exceptions.PageError if the page doesn't exist. That's the error you want to catch.
import wikipedia
links = ["CPython","no page"]
test=[]
for link in links:
try:
#try to load the wikipedia page
page=wikipedia.page(link, auto_suggest=False)
test.append(page)
except wikipedia.exceptions.PageError:
#if a "PageError" was raised, ignore it and continue to next link
continue
You have to surround the function wikipedia.page by a try block, so I'm afraid you can't use list comprehension.
Understand that this will be bad practice, but for a one off quick and dirty script you can just:
edit: Wait, sorry. I've just noticed the list comprehension. I'm actually not sure if this will work without breaking that down:
links = ["CPython", "no page"]
test = []
for link in links:
try:
page = wikipedia.page(link, auto_suggest=False)
test.append(page)
except wikipedia.exceptions.PageError:
pass
test = [testitem.content for testitem in test]
print(test)
pass Tells python to essentially to trust you and ignore the error so that it can continue on about its day.
I'm looking at scraping some data from Facebook using Python 2.7. My code basically augments by 1 changing the Facebook profile ID to then capture details returned by the page.
An example of the page I'm looking to capture the data from is graph.facebook.com/4.
Here's my code below:
import scraperwiki
import urlparse
import simplejson
source_url = "http://graph.facebook.com/"
profile_id = 1
while True:
try:
profile_id +=1
profile_url = urlparse.urljoin(source_url, str(profile_id))
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json['results']:
print result
data = {}
data['id'] = result['id']
data['name'] = result['name']
data['first_name'] = result['first_name']
data['last_name'] = result['last_name']
data['link'] = result['link']
data['username'] = result['username']
data['gender'] = result['gender']
data['locale'] = result['locale']
print data['id'], data['name']
scraperwiki.sqlite.save(unique_keys=['id'], data=data)
#time.sleep(3)
except:
continue
profile_id +=1
I am using the scraperwiki site to carry out this check but no data is printed back to console despite the line 'print data['id'], data['name'] used just to check the code is working
Any suggestions on what is wrong with this code? As said, for each returned profile, the unique data should be captured and printed to screen as well as populated into the sqlite database.
Thanks
Any suggestions on what is wrong with this code?
Yes. You are swallowing all of your errors. There could be a huge number of things going wrong in the block under try. If anything goes wrong in that block, you move on without printing anything.
You should only ever use a try / except block when you are looking to handle a specific error.
modify your code so that it looks like this:
while True:
profile_id +=1
profile_url = urlparse.urljoin(source_url, str(profile_id))
results_json = simplejson.loads(scraperwiki.scrape(profile_url))
for result in results_json['results']:
print result
data = {}
# ... more ...
and then you will get detailed error messages when specific things go wrong.
As for your concern in the comments:
The reason I have the error handling is because, if you look for
example at graph.facebook.com/3, this page contains no user data and
so I don't want to collate this info and skip to the next user, ie. no
4 etc
If you want to handle the case where there is no data, then find a way to handle that case specifically. It is bad practice to swallow all errors.
My script below scrapes a website and returns the data from a table. It's not finished but it works. The problem is that it has no error checking. Where should I have error handling in my script?
There are no unittests, should I write some and schedule my unittests to be run periodicaly. Or should the error handling be done in my script?
Any advice on the proper way to do this would be great.
#!/usr/bin/env python
''' Gets the Canadian Monthly Residential Bill Calculations table
from URL and saves the results to a sqllite database.
'''
import urllib2
from BeautifulSoup import BeautifulSoup
class Bills():
''' Canadian Monthly Residential Bill Calculations '''
URL = "http://www.hydro.mb.ca/regulatory_affairs/energy_rates/electricity/utility_rate_comp.shtml"
def __init__(self):
''' Initialization '''
self.url = self.URL
self.data = []
self.get_monthly_residential_bills(self.url)
def get_monthly_residential_bills(self, url):
''' Gets the Monthly Residential Bill Calculations table from URL '''
doc = urllib2.urlopen(url)
soup = BeautifulSoup(doc)
res_table = soup.table.th.findParents()[1]
results = res_table.findNextSibling()
header = self.get_column_names(res_table)
self.get_data(results)
self.save(header, self.data)
def get_data(self, results):
''' Extracts data from search result. '''
rows = results.childGenerator()
data = []
for row in rows:
if row == "\n":
continue
for td in row.contents:
if td == "\n":
continue
data.append(td.text)
self.data.append(tuple(data))
data = []
def get_column_names(self, table):
''' Gets table title, subtitle and column names '''
results = table.findAll('tr')
title = results[0].text
subtitle = results[1].text
cols = results[2].childGenerator()
column_names = []
for col in cols:
if col == "\n":
continue
column_names.append(col.text)
return title, subtitle, column_names
def save(self, header, data):
pass
if __name__ == '__main__':
a = Bills()
for td in a.data:
print td
See the documentation of all the functions and see what all exceptions do they throw.
For ex, in urllib2.urlopen(), it's written that Raises URLError on errors. It's a subclass of IOError.
So, for the urlopen(), you could do something like:
try:
doc = urllib2.urlopen(url)
except IOError:
print >> sys.stderr, 'Error opening URL'
Similary, do the same for others.
You should write unit tests and you should use exception handling. But only catch the exceptions you can handle; you do no one any favors by catching everything and throwing any useful information out.
Unit tests aren't run periodically though; they're run before and after the code changes (although it is feasible for one change's "after" to become another change's "before" if they're close enough).
A copple places you need to have them.is in importing things like tkinter
try:
import Tkinter as tk
except:
import tkinter as tk
also anywhere where the user enters something with a n intended type. A good way to figure this out is to run it abd try really hard to make it crash. Eg typing in wrong type.
The answer to "where should I have error handling in my script?" is basically "any place where something could go wrong", which depends entirely on the logic of your program.
In general, any place where your program relies on an assumption that a particular operation worked as you intended, and there's a possibility that it may not have, you should add code to check whether or not it actually did work, and take appropriate remedial action if it didn't. In some cases, the underlying code might generate an exception on failure and you may be happy to just let the program terminate with an uncaught exception without adding any error-handling code of your own, but (1) this would be, or ought to be, rare if anyone other than you is ever going to use that program; and (2) I'd say this would fall into the "works as intended" category anyway.
Currently, there's a game that has different groups, and you can play for a prize 'gold' every hour. Sometimes there is gold, sometimes there isn't. It is posted on facebook every hour ''gold in group2" or "gold in group6'', and other times there isn't a post due to no gold being a prize for that hour. I want to write a small script that will check the site hourly and grab the result (if there is gold or not, and what group) and display it back to me. I was wanting to write it in python as I'm learning it. Would this be the best language to use? And how would I go about doing this? All I can really find is information on extracting links. I don't want to extract links, just the text. Thanks for any and all help. I appreciate it.
Check out urllib2 for getting html from a url and BeautifulSoup/HTMLParser/etc to parse the html. Then, you could use something like this as a starting point for the script:
import time
import urllib2
import BeautifulSoup
import HTMLParser
def getSource(url, postdata):
source = ""
req = urllib2.Request(url, postdata)
try:
sock = urllib2.urlopen(req)
except urllib2.URLError, exc:
# handle the error..
pass
else:
source = sock.read()
finally:
try:
sock.close()
except:
pass
return source
def parseSource(source):
pass
# parse source with BeautifulSoup/HTMLParser, or here...
def main():
last_run = 0
while True:
t1 = time.time()
# check if 1 hour has passed since last_run
if t1 - last_run >= 3600:
source = getSource("someurl.com", "user=me&blah=foo")
last_run = time.time()
parseSource(source)
else:
# sleep for 60 seconds and check time again.
time.sleep(60)
return 0
if __name__ == "__main__":
sys.exit(main())
Here is a good article about parsing-html-with-python
I have something similiar to what you have, but you left out what my main question revolves around. I looked at htmlparser and bs, but I am unsure how to do something like if($posttext == gold) echo "gold in so and so".. seems like bs deals a lot with tags..i suppose since facebook posts can use a variety of tags, how would i go about doing just a search on the text and to return the 'post' ??