Python - How to handle exception while iterating through list? - python

I am using Selenium library and trying to iterate through list of items look them up on web and while my loop is working when item are found I am having hard time handling the exception when item is not find on the web page. For this instance I know that if the item is not found the page will show " No Results For" within span to which I can access with:
browser.find_by_xpath('(.//span[#class = "a-size-medium a-color-base"])[1]')[0].text
Now the problem is that this span only appear when item loop is searching is not found. So I tried this logic, if this span doesn't exist than it means item is found so execute rest of the loop, if the the span does exist and is equal to " No Results For", then go and search for next item. Here is my code:
data = pd.DataFrame()
for i in lookup_list:
start_url = f"https://www.amazon.com/s?k=" + i +"&ref=nb_sb_noss_1"
browser.visit(start_url)
if browser.find_by_xpath('(.//span[#class = "a-size-medium a-color-base"])[1]') is not None :
#browser.find_by_xpath("//a[#class='a-size-medium a-color-base']"):
item = browser.find_by_xpath("//a[#class='a-link-normal']")
item.click()
html = browser.html
soup = bs(html, "html.parser")
collection_dict ={
'PART_NUMBER': getmodel(soup),
'DIMENSIONS': getdim(soup),
'IMAGE_LINK': getImage(soup)
}
elif browser.find_by_xpath('(.//span[#class = "a-size-medium a-color-base"])[1]')[0].text != 'No results for':
continue
data = data.append(collection_dict, ignore_index=True)
The error I am getting is:
AttributeError: 'ElementList' object has no attribute 'click'
I do understand that the error I am getting is because I cant access attribute click since it the list has multiple items and therefore i cant click on all of them. But what im trying to do is to avoid even trying to access it if the page showes that the item is not found, i want the script to simply go to next item and search.
How do I modify this?
Thank you in advance.

Using a try-except with a pass is what you want in this situation, like #JammyDodger said. Although using this typically isn't a good sign because you don't want to simply ignore errors most of the time. pass will simply ignore the error and continue the rest of the loop.
try:
item.click()
except AttributeError:
pass
In order to skip to the next iteration of the loop, you may want to use the continue keyword.
try:
item.click()
except AttributeError:
continue

Related

Turn <class 'str> throws error when put in list

I'm currently working on a youtube webscraper for comments.
I want to scape the comments and put them in a dataframe. My code can only print the text but I'm unable to put the text into a dataframe. When I check the output's type, it is a ' <class 'str'> ' I'm able to get the text through this code:
try:
# Extract the elements storing the usernames and comments.
username_elems = driver.find_elements_by_xpath('//*[#id="author-text"]')
comment_elems = driver.find_elements_by_xpath('//*[#id="content-text"]')
except exceptions.NoSuchElementException:
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
for com_text in comment_elems:
print(com_text.text)
If I check the text through this code at the end of my function.
for com_text in comment_elems:
print(type(com_text.text)
then the result is <class 'str'>. And then I am unable to put this in a dataframe.
When I do try to put this <class 'str'> object in a dataframe, I get the error: TypeError: 'WebElement' object does not support item assignment
This is the code that I use when trying to put the text in a dataframe:
for username, comment in zip(username_elems, comment_elems):
comment_section['comment'] = comment.text
data.append(comment_section)
I'm hoping there is a way to convert the <class 'str'> object into a regular string type or if there is another step I can take to extract the text from the object.
Here is my full code
def gitscrape(url):
# Note: replace argument with absolute path to the driver executable.
driver = webdriver.Chrome('chromedriver/windows/chromedriver.exe')
# Navigates to the URL, maximizes the current window, and
# then suspends execution for (at least) 5 seconds (this gives time for the page to load).
driver.get(url)
driver.maximize_window()
time.sleep(5)
#empty subjects
comment_section =[]
comment_data = []
try:
# Extract the elements storing the video title and
# comment section.
title = driver.find_element_by_xpath('//*[#id="container"]/h1/yt-formatted-string').text
comment_section = driver.find_element_by_xpath('//*[#id="comments"]')
except exceptions.NoSuchElementException:
# Note: Youtube may have changed their HTML layouts for videos, so raise an error for sanity sake in case the
# elements provided cannot be found anymore.
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
# Scroll into view the comment section, then allow some time
# for everything to be loaded as necessary.
driver.execute_script("arguments[0].scrollIntoView();", comment_section)
time.sleep(7)
# Scroll all the way down to the bottom in order to get all the
# elements loaded (since Youtube dynamically loads them).
last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
# Scroll down 'til "next load".
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
# Wait to load everything thus far.
time.sleep(2)
# Calculate new scroll height and compare with last scroll height.
new_height = driver.execute_script("return document.documentElement.scrollHeight")
if new_height == last_height:
break
last_height = new_height
# One last scroll just in case.
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
try:
# Extract the elements storing the usernames and comments.
username_elems = driver.find_elements_by_xpath('//*[#id="author-text"]')
comment_elems = driver.find_elements_by_xpath('//*[#id="content-text"]')
except exceptions.NoSuchElementException:
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
# for com_text in comment_elems:
# print(type(com_text.text)
# data.append(comment_section)
for username, comment in zip(username_elems, comment_elems):
comment_section['comment'] = comment.text
data.append(comment_section)
video1_comments = pd.DataFrame(data)
<class 'str'> is used to represent normal strings in Python. In the code below both print statements print out <class 'str'>. You must be facing a different issue.
a = 12345
a = str(a)
print(type(a))
b = "12345"
print(type(b))
Your error occurs in the line comment_section['comment'] = comment.text. You write in your text that you encounter that error when you try to put a string into a dataframe, but neither comment_section nor comment is a dataframe. In your title you write that adding a string to a list which throws an error, but comment_section is also not a list (and if it where, the syntax wouldn't make any sense). Coding is very sensitive to what you're actually doing, so having a dataframe or a list makes a big difference.
What is comment_section actually for a type? If you scroll up throw your code the last assignment is the following: comment_section = driver.find_element_by_xpath('//*[#id="comments"]') so comment_section is actually neither a dataframe nor a list, but a webelement! Now the error you got also makes sense, it says TypeError: 'WebElement' object does not support item assignment and indeed you're trying to assign comment.text to the comment key of the WebElement comment_section, but the WebElement does not support this.
You can repair this by not overwriting comment_sectin but by using a different name.

Exception handling in Selenium, how to iterate through a bunch of element locators until one works

The id of a button I want to click is changing dynamically. For example the id will be id = Button7 then the next time I run my code it is id = Button19. I noticed it loops through a set of ids but in no particular order.
I would like to iterate through all possible solutions until one of them works. Would like to do something similar to this logic.
try:
source8 = driver.find_element_by_xpath('//*[#id="xl_dijit-bootstrap_Button_99"]')
ActionChains(driver).click(source8).perform()
except Exception as e:
source8 = driver.find_element_by_xpath('//*[#id="xl_dijit-bootstrap_Button_7"]')
ActionChains(driver).click(source8).perform()
except Exception as e:
source8 = driver.find_element_by_xpath('//*[#id="xl_dijit-bootstrap_Button_27"]')
ActionChains(driver).click(source8).perform()
Just iterate over the xpaths:
for xpath in ['//xpath1', '//hpath2', '//xpath3']:
try:
# do something with xpath
break
except:
print(xpath + " failed!")
You can use contains XPath axes to detect the ID first and then do necessary actions.
elementId = driver.find_element_by_xpath("//input[contains(#id, 'xl_dijit-bootstrap_Button_')]")
elementId.click()
Else if you wish to do some other action with a specific ID then retrieve it's attribute and with this attribute('idTextAttribute') you can implement a switch case.
idTextAttribute = elementId.get_attribute("id")
def SwitchToId(idTextAttribute):
switcher = {
"xl_dijit-bootstrap_Button_99": Do Something, like click Or sendKeys,
"xl_dijit-bootstrap_Button_7": Do Something, like click Or sendKeys,
"xl_dijit-bootstrap_Button_27": Do Something, like click Or sendKeys,
}
return switcher.get(idTextAttribute, "ID not Found")
Note: Python doesn't have a switch case like Java so you can try switcher or if-elif block.

Finding if some tag exists in HTML response and printing if/else accordingly

I am trying to collect data from a website (using Python). In a webpage, there are multiple listings of software and in each listing. My data is within a tag (h5) and certain class ('price_software_details).
However, in some cases, tag along with the data is missing. I want to print 'NA' message if data and tag are missing else it should print the data.
I tried the code that I have mentioned below, though it's not working.
Help please!
interest = soup.find(id = 'allsoftware')
for link in interest.findAll('h5'):
if link.find(class_ = 'price_software_details') == True:
print(link.getText())
else:
print('NA')
Have you tried error handling (try, except)?
interest = soup.find(id='allsoftware')
for link in interest.findAll('h5'):
try:
item = link.find({'class':'price_software_details'})
print(item.get_text())
except:
print('NA')
You need to know soup.find() never be True.It only will be result or None.
interest = soup.find(id = 'allsoftware')
for link in interest.findAll('h5'):
if link.find(class_ = 'price_software_details'):
print(link.getText())
else:
print('NA')

TypeError: 'NoneType' object is not callable when scraping

I am new in Python and try to scrape data from the web to (eventually) feed a small database.
My code is generating a NoneType error. Could you assist?
import urllib2
from bs4 import BeautifulSoup
#1- create files to Leagues, stock data and error
FLeague= open("C:\Python\+exercice\SoccerLeague.txt","w")
FData=open("C:\Python\+exercice\FootballDump.txt","w")
ErrorFile=open("C:\Python\+exercice\ErrorFootballScrap.txt","w")
#Open the website
# 1- grab the data and get the error too
soup = BeautifulSoup(urllib2.urlopen("http://www.soccerstats.com/leagues.asp").read(),"html")
TableLeague = soup.find("table", {"class" : "trow8"})
print TableLeague
#\here I just want to grab country name
for row in TableLeague("tr")[2:]:
col = row.findAll("td")
# I try to identify errors
try:
country = col[1].a.string.stip()
FLeague.write(country+"\n")
except Exception as e:
ErrorFile.write (country + ";" + str(e)+";"+str(col)+"\n")
pass
#close the files
FLeague.close
FData.close
ErrorFile.close
The first problem is coming from:
TableLeague("tr")[2:]
TableLeague is None here since there is no table element with trow8 class. Instead use the id attribute to find the desired table element:
TableLeague = soup.find("table", {"id": "btable"})
Also, you probably meant strip() and not stip() here: col[1].a.string.stip().
And, in order to close the files, call the close() method. Replace:
FLeague.close
FData.close
ErrorFile.close
with:
FLeague.close()
FData.close()
ErrorFile.close()
Or, even better, use with context manager to work with files - you would not need to close a file explicitly.

Python: return empty value on exception

I have some experience in Python, but I have never used try & except functions to catch errors due to lack of formal training.
I am working on extracting a few articles from wikipedia. For this I have an array of titles, a few of which do not have any article or search result at the end. I would like the page retrieval function just to skip those few names and continue running the script on the rest. Reproducible code follows.
import wikipedia
# This one works.
links = ["CPython"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)
#The sequence breaks down if there is no wikipedia page.
links = ["CPython","no page"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)
The library running it uses a method like this. Normally it would be really bad practice, but since this is just for a one-off data extraction, I am willing to change the local copy of the library to get it to work. Edit I included the complete function now.
def page(title=None, pageid=None, auto_suggest=True, redirect=True, preload=False):
'''
Get a WikipediaPage object for the page with title `title` or the pageid
`pageid` (mutually exclusive).
Keyword arguments:
* title - the title of the page to load
* pageid - the numeric pageid of the page to load
* auto_suggest - let Wikipedia find a valid page title for the query
* redirect - allow redirection without raising RedirectError
* preload - load content, summary, images, references, and links during initialization
'''
if title is not None:
if auto_suggest:
results, suggestion = search(title, results=1, suggestion=True)
try:
title = suggestion or results[0]
except IndexError:
# if there is no suggestion or search results, the page doesn't exist
raise PageError(title)
return WikipediaPage(title, redirect=redirect, preload=preload)
elif pageid is not None:
return WikipediaPage(pageid=pageid, preload=preload)
else:
raise ValueError("Either a title or a pageid must be specified")
What should I do to retreive only the pages that do not give the error. Maybe there is a way to filter out all items in the list that give this error or an error of some kind. Returning "NA" or similar would be fine with pages that don't exist. Skipping them without notice would be fine too. Thanks!
The function wikipedia.page will raise a wikipedia.exceptions.PageError if the page doesn't exist. That's the error you want to catch.
import wikipedia
links = ["CPython","no page"]
test=[]
for link in links:
try:
#try to load the wikipedia page
page=wikipedia.page(link, auto_suggest=False)
test.append(page)
except wikipedia.exceptions.PageError:
#if a "PageError" was raised, ignore it and continue to next link
continue
You have to surround the function wikipedia.page by a try block, so I'm afraid you can't use list comprehension.
Understand that this will be bad practice, but for a one off quick and dirty script you can just:
edit: Wait, sorry. I've just noticed the list comprehension. I'm actually not sure if this will work without breaking that down:
links = ["CPython", "no page"]
test = []
for link in links:
try:
page = wikipedia.page(link, auto_suggest=False)
test.append(page)
except wikipedia.exceptions.PageError:
pass
test = [testitem.content for testitem in test]
print(test)
pass Tells python to essentially to trust you and ignore the error so that it can continue on about its day.

Categories