Checking the clickability of an element in selenium using python - python
I've been trying to write a script which will give me all the links to the episodes present on this page :- http://www.funimation.com/shows/assassination-classroom/videos/episodes
As you can see that the links can be seen in 'Outer HTML', I used selenium and PhantomJS with python.
Link Example: http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time
However, I can't seem to get my code right. I do have a basic Idea of what I want to do. Here's the process :-
1.) Copy the Outer HTML of the very first page and then save it as 'Source_html' file.
2.) Look for links inside this file.
3.) Move to the next page to see rest of the videos and their links.
4.) Repeat the step 2.
This is what my code looks like :
from selenium import webdriver
from selenium import selenium
from bs4 import BeautifulSoup
import time
# ---------------------------------------------------------------------------------------------
driver = webdriver.PhantomJS()
driver.get('http://www.funimation.com/shows/assassination-classroom/videos/episodes')
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
f = open('source_code.html', 'w')
f.write(source_code.encode('utf-8'))
f.close()
print 'Links On First Page Are : \n'
soup = BeautifulSoup('source_code.html')
subtitles = soup.find_all('div',{'class':'popup-heading'})
official = 'something'
for official in subtitles:
x = official.findAll('a')
for a in x:
print a['href']
sbtn = driver.find_element_by_link_text(">"):
print sbtn
print 'Entering The Loop Now'
for driver.find_element_by_link_text(">"):
sbtn.click()
time.sleep(3)
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
f = open('source_code1.html', 'w')
f.write(source_code.encode('utf-8'))
f.close()
Things I already know :-
soup = BeautifulSoup('source_code.html') won't work, because I need to open this file via python and feed it into BS after that. That I can manage.
That official variable isn't really doing anything. Just helping me start a loop.
for driver.find_element_by_link_text(">"):
Now, this is what I need to fix somehow. I'm not sure how to check if this thing is still clickable or not. If yes, then proceed to next page, get the links, click this again to go to page 3 and repeat the process.
Any help would be appreciated.
You don't need to use BeautifulSoup here at all. Just grab all the links via selenium. Proceed to next page only if the > link is visible. Here is the complete implementation including gathering the links, necessary waits. It should work for any page count:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.PhantomJS()
driver.get("http://www.funimation.com/shows/assassination-classroom/videos/episodes")
wait = WebDriverWait(driver, 10)
links = []
while True:
# wait for the page to load
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.item-title")))
# wait until the loading circle becomes invisible
wait.until(EC.invisibility_of_element_located((By.ID, "loadingCircle")))
links.extend([link.get_attribute("href") for link in driver.find_elements_by_css_selector("a.item-title")])
print("Parsing page number #" + driver.find_element_by_css_selector("a.jp-current").text)
# click next
next_link = driver.find_element_by_css_selector("a.next")
if not next_link.is_displayed():
break
next_link.click()
time.sleep(1) # hardcoded delay
print(len(links))
print(links)
For the mentioned in the question URL, it prints:
Parsing page number #1
Parsing page number #2
93
['http://www.funimation.com/shows/assassination-classroom/videos/official/assassination-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/assassination-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/assassination-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/baseball-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/baseball-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/baseball-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/grown-up-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/grown-up-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/grown-up-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/assembly-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/assembly-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/assembly-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/test-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/test-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/test-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/school-trip-time1st-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/school-trip-time1st-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/school-trip-time1st-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/school-trip-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/school-trip-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/school-trip-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/transfer-student-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/transfer-student-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/transfer-student-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/l-and-r-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/l-and-r-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/l-and-r-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/transfer-student-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/transfer-student-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/transfer-student-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/ball-game-tournament-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/ball-game-tournament-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/ball-game-tournament-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/talent-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/talent-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/talent-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/vision-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/vision-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/vision-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/end-of-term-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/end-of-term-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/end-of-term-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/schools-out1st-term', 'http://www.funimation.com/shows/assassination-classroom/videos/official/schools-out1st-term', 'http://www.funimation.com/shows/assassination-classroom/videos/official/schools-out1st-term', 'http://www.funimation.com/shows/assassination-classroom/videos/official/island-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/island-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/island-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/action-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/action-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/action-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/pandemonium-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/pandemonium-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/pandemonium-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time2nd-period', 'http://www.funimation.com/shows/assassination-classroom/videos/official/karma-time2nd-period', 'http://www.funimation.com/shows/deadman-wonderland', 'http://www.funimation.com/shows/deadman-wonderland', 'http://www.funimation.com/shows/riddle-story-of-devil', 'http://www.funimation.com/shows/riddle-story-of-devil', 'http://www.funimation.com/shows/soul-eater', 'http://www.funimation.com/shows/soul-eater', 'http://www.funimation.com/shows/assassination-classroom/videos/official/xx-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/xx-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/xx-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/nagisa-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/nagisa-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/nagisa-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/summer-festival-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/summer-festival-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/summer-festival-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/kaede-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/kaede-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/kaede-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/itona-horibe-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/itona-horibe-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/itona-horibe-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/spinning-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/spinning-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/spinning-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/leader-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/leader-time', 'http://www.funimation.com/shows/assassination-classroom/videos/official/leader-time', 'http://www.funimation.com/shows/deadman-wonderland', 'http://www.funimation.com/shows/deadman-wonderland', 'http://www.funimation.com/shows/riddle-story-of-devil', 'http://www.funimation.com/shows/riddle-story-of-devil', 'http://www.funimation.com/shows/soul-eater', 'http://www.funimation.com/shows/soul-eater']
Basically, I use webelement.is_displayed() to check if it is clickable or not.
isLinkDisplay = driver.find_element_by_link_text(">").is_displayed()
Related
How do I make the driver navigate to new page in selenium python
I am trying to write a script to automate job applications on Linkedin using selenium and python. The steps are simple: open the LinkedIn page, enter id password and log in open https://linkedin.com/jobs and enter the search keyword and location and click search(directly opening links like https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia get stuck as loading, probably due to lack of some post information from the previous page) the click opens the job search page but this doesn't seem to update the driver as it still searches on the previous page. import selenium from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from bs4 import BeautifulSoup import pandas as pd import yaml driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver") url = "https://linkedin.com/" driver.get(url) content = driver.page_source stream = open("details.yaml", 'r') details = yaml.safe_load(stream) def login(): username = driver.find_element_by_id("session_key") password = driver.find_element_by_id("session_password") username.send_keys(details["login_details"]["id"]) password.send_keys(details["login_details"]["password"]) driver.find_element_by_class_name("sign-in-form__submit-button").click() def get_experience(): return "1%C22" login() jobs_url = f'https://www.linkedin.com/jobs/' driver.get(jobs_url) keyword = driver.find_element_by_xpath("//input[starts-with(#id, 'jobs-search-box-keyword-id-ember')]") location = driver.find_element_by_xpath("//input[starts-with(#id, 'jobs-search-box-location-id-ember')]") keyword.send_keys("python") location.send_keys("Australia") driver.find_element_by_xpath("//button[normalize-space()='Search']").click() WebDriverWait(driver, 10) # content = driver.page_source # soup = BeautifulSoup(content) # with open("a.html", 'w') as a: # a.write(str(soup)) print(driver.current_url) driver.current_url returns https://linkedin.com/jobs/ instead of https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia as it should. I have tried to print the content to a file, it is indeed from the previous jobs page and not from the search page. I have also tried to search elements from page like experience and easy apply button but the search results in a not found error. I am not sure why this isn't working. Any ideas? Thanks in Advance UPDATE It works if try to directly open something like https://www.linkedin.com/jobs/search/?f_AL=True&f_E=2&keywords=python&location=Australia but not https://www.linkedin.com/jobs/search/?f_AL=True&f_E=1%2C2&keywords=python&location=Australia the difference in both these links is that one of them takes only one value for experience level while the other one takes two values. This means it's probably not a post values issue.
You are getting and printing the current URL immediately after clicking on the search button, before the page changed with the response received from the server. This is why it outputs you with https://linkedin.com/jobs/ instead of something like https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia. WebDriverWait(driver, 10) or wait = WebDriverWait(driver, 20) will not cause any kind of delay like time.sleep(10) does. wait = WebDriverWait(driver, 20) only instantiates a wait object, instance of WebDriverWait module / class
Web scraping when scrolling down is needed
I want to scrape, e.g., the title of the first 200 questions under the web page https://www.quora.com/topic/Stack-Overflow-4/all_questions. And I tried the following code: import requests from bs4 import BeautifulSoup url = "https://www.quora.com/topic/Stack-Overflow-4/all_questions" print("url") print(url) r = requests.get(url) # HTTP request print("r") print(r) html_doc = r.text # Extracts the html print("html_doc") print(html_doc) soup = BeautifulSoup(html_doc, 'lxml') # Create a BeautifulSoup object print("soup") print(soup) It gave me a text https://pastebin.com/9dSPzAyX. If we search href='/, we can see that the html does contain title of some questions. However, the problem is that the number is not enough; actually on the web page, a user needs to manually scroll down to trigger extra load. Does anyone know how I could mimic "scrolling down" by the program to load more content of the page?
Infinite scrolls on a webpage is based on the Javascript functionality. Therefore, to find out what URL we need to access and what parameters to use, we need to either thoroughly study the JS code working inside the page or, and preferably, examine the requests that the browser does when you scroll down the page. We can study requests using the Developer Tools. See example for quora the more you scroll down, the more requests generated. so now your requests will be done to that url instead of normal url but keep in mind to send correct headers and playload. other easier solution will be by using selenium
Couldn't find a response using request. But you can use Selenium. First printed out the number of questions at first load, then send the End key to mimic scrolling down. You can see number of questions went from 20 to 40 after sending the End key. I used driver.implicitly wait for 5 seconds before loading the DOM again in case the script load to fast before the DOM was loaded. You can improve by using EC with selenium. The page loads 20 questions per scroll. So if you are looking to scrape 100 questions, then you need to send the End key 5 times. To use the code below you need to install chromedriver. http://chromedriver.chromium.org/downloads from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By CHROMEDRIVER_PATH = "" CHROME_PATH = "" WINDOW_SIZE = "1920,1080" chrome_options = Options() # chrome_options.add_argument("--headless") chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE) chrome_options.binary_location = CHROME_PATH prefs = {'profile.managed_default_content_settings.images':2} chrome_options.add_experimental_option("prefs", prefs) url = "https://www.quora.com/topic/Stack-Overflow-4/all_questions" def scrape(url, times): if not url.startswith('http'): raise Exception('URLs need to start with "http"') driver = webdriver.Chrome( executable_path=CHROMEDRIVER_PATH, chrome_options=chrome_options ) driver.get(url) counter = 1 while counter <= times: q_list = driver.find_element_by_class_name('TopicAllQuestionsList') questions = [x for x in q_list.find_elements_by_xpath('//div[#class="pagedlist_item"]')] q_len = len(questions) print(q_len) html = driver.find_element_by_tag_name('html') html.send_keys(Keys.END) wait = WebDriverWait(driver, 5) time.sleep(5) questions2 = [x for x in q_list.find_elements_by_xpath('//div[#class="pagedlist_item"]')] print(len(questions2)) counter += 1 driver.close() if __name__ == '__main__': scrape(url, 5)
I recommend using selenium rather than bs. selenium can control browser and parsing. like scroll down, click button, etc… this example is for scroll down for get all liker user in instagram. https://stackoverflow.com/a/54882356/5611675
If the content only loads on "scrolling down", this probably means that the page is using Javascript to dynamically load the content. You can try using a web client such as PhantomJS to load the page and execute the javascript in it, and simulate the scroll by injecting some JS such as document.body.scrollTop = sY; (Simulate scroll event using Javascript).
Python and Selenium: I am automating web scraping among pages. How can I loop by Next button?
I already written several lines of codes to pull url from this website. http://www.worldhospitaldirectory.com/United%20States/hospitals code is below: from selenium import webdriver from selenium.webdriver.common.keys import Keys import time import csv driver = webdriver.Firefox() driver.get('http://www.worldhospitaldirectory.com/United%20States/hospitals') url = [] pagenbr = 1 while pagenbr <= 115: current = driver.current_url driver.get(current) lks = driver.find_elements_by_xpath('//*[#href]') for ii in lks: link = ii.get_attribute('href') if '/info' in link: url.append(link) print('page ' + str(pagenbr) + ' is done.') if pagenbr <=114: elm = driver.find_element_by_link_text('Next') driver.implicitly_wait(10) elm.click() time.sleep(2) pagenbr += 1 ls = list(set(url)) with open('US_GeneralHospital.csv', 'wb') as myfile: wr = csv.writer(myfile,quoting=csv.QUOTE_ALL) for u in ls: wr.writerow([u]) And it worked very well to pull each individual links from this website. But the problem is I need to change the page number I need to loop by myself every time. I want to let this code upgrade to iterate by calculating how many time it need. Not by manually inputting. Thank you very much.
This is bad idea to hardcode the number of pages in your script. Try just to click "Next" button while it is enabled: from selenium.common.exceptions import NoSuchElementException while True: try: # do whatever you need to do on page driver.find_element_by_xpath('//li[not(#class="disabled")]/span[text()="Next"]').click() except NoSuchElementException: break This should allow you to execute page scraping until the last page reached Also note that using lines current = driver.current_url and driver.get(current) makes no sense at all, so you might skip them
Selenium loop page refreshed in python
I have some questions related to doing the loop with Selenium in Python. In fact, I want to iterate a list of links tracked by 'driver.find_elements_by_id' and click on them one by one, but the problem is such that each time I click on the link ('linklist' in the code), the page is refreshed so there is an error message indicating that 'Message: The element reference is stale. Either the element is no longer attached to the DOM or the page has been refreshed.' I know that the reason is because the list of links disappeared after the click. But how can I generally in Selenium iterate the list even though the page doesn't exist anymore. I used 'driver.back()' and apparently it doesn't work. The error message pops up after this line in the code: link.click() the linklist is located in this URL (I want to clink on the button Document and then download the first file after the refreshed page is displayed) 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001467373&type=10-K&dateb=20101231&owner=exclude&count=40' Can someone have a look at this problem? Thank you! from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait import unittest import os import time from bs4 import BeautifulSoup from selenium.webdriver.common.keys import Keys import requests import html2text class LoginTest(unittest.TestCase): def setUp(self): self.driver=webdriver.Firefox() self.driver.get("https://www.sec.gov/edgar/searchedgar/companysearch.html") def test_Login(self): driver=self.driver cikID="cik" searchButtonID="cik_find" typeID="//*[#id='type']" priorID="prior_to" cik="00001467373" Type="10-K" prior="20101231" search2button="//*[#id='contentDiv']/div[2]/form/table/tbody/tr/td[6]/input[1]" documentsbuttonid="documentsbutton" formbuttonxpath='//a[text()="d10k.htm"]' cikElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(cikID)) cikElement.clear() cikElement.send_keys(cik) searchButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(searchButtonID)) searchButtonElement.click() typeElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(typeID)) typeElement.clear() typeElement.send_keys(Type) priorElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_id(priorID)) priorElement.clear() priorElement.send_keys(prior) search2Element=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(search2button)) search2Element.send_keys(Keys.SPACE) time.sleep(1) documentsButtonElement=WebDriverWait(driver,20).until(lambda driver:driver.find_element_by_id(documentsbuttonid)) a=driver.current_url window_be1 = driver.window_handles[0] linklist=driver.find_elements_by_id(documentsbuttonid) with open("D:/doc2/"+"a"+".txt", mode="w",errors="ignore") as newfile: for link in linklist: link.click() formElement=WebDriverWait(driver,30).until(lambda driver:driver.find_element_by_xpath(formbuttonxpath)) formElement.click() time.sleep(1) t=driver.current_url r = requests.get(t) data = r.text newfile.write(html2text.html2text(data)) drive.back() drive.back() def terdown(self): self.driver.quit() if __name__=='__main__': unittest.main()
You should not use a list of web-elements, but a list of links. Try something like this: linklist = [] for link in driver.find_elements_by_xpath('//h4[#class="title"]/a'): linklist.append(link.get_attribute('href')) And then you can iterate through list of links for link in linklist: driver.get(link) # do some actions on page If you want to physically click on each link, you might need to use for link in linklist: driver.find_element_by_xpath('//h4[#class="title"]/a[#href=%s]' % link).click() # do some actions on page
Scrape with BeautifulSoup from site that uses AJAX pagination using Python
I'm fairly new to coding and Python so I apologize if this is a silly question. I'd like a script that goes through all 19,000 search results pages and scrapes each page for all of the urls. I've got all of the scrapping working but can't figure out how to deal with the fact that the page uses AJAX to paginate. Usually I'd just make a loop with the url to capture each search result but that's not possible. Here's the page: http://www.heritage.org/research/all-research.aspx?nomobile&categories=report This is the script I have so far: with io.open('heritageURLs.txt', 'a', encoding='utf8') as logfile: page = urllib2.urlopen("http://www.heritage.org/research/all-research.aspx?nomobile&categories=report") soup = BeautifulSoup(page) snippet = soup.find_all('a', attrs={'item-title'}) for a in snippet: logfile.write ("http://www.heritage.org" + a.get('href') + "\n") print "Done collecting urls" Obviously, it scrapes the first page of results and nothing more. And I have looked at a few related questions but none seem to use Python or at least not in a way that I can understand. Thank you in advance for your help.
For the sake of completeness, while you may try accessing the POST request and to find a way round to access to next page, like I suggested in my comment, if an alternative is possible, using Selenium will be quite easy to achieve what you want. Here is a simple solution using Selenium for your question: from selenium import webdriver from selenium.webdriver.common.keys import Keys from time import sleep # uncomment if using Firefox web browser driver = webdriver.Firefox() # uncomment if using Phantomjs #driver = webdriver.PhantomJS() url = 'http://www.heritage.org/research/all-research.aspx?nomobile&categories=report' driver.get(url) # set initial page count pages = 1 with open('heritageURLs.txt', 'w') as f: while True: try: # sleep here to allow time for page load sleep(5) # grab the Next button if it exists btn_next = driver.find_element_by_class_name('next') # find all item-title a href and write to file links = driver.find_elements_by_class_name('item-title') print "Page: {} -- {} urls to write...".format(pages, len(links)) for link in links: f.write(link.get_attribute('href')+'\n') # Exit if no more Next button is found, ie. last page if btn_next is None: print "crawling completed." exit(-1) # otherwise click the Next button and repeat crawling the urls pages += 1 btn_next.send_keys(Keys.RETURN) # you should specify the exception here except: print "Error found, crawling stopped" exit(-1) Hope this helps.