I am trying to keep clicking to next page on this website, each time appending the table data to a csv file and then when I reach the last page, append the table data and break the while loop
Unfortunately, for some reason it keeps staying on the last page, and I have tried several different methods to catch the error
while True:
try :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
except :
print("No more pages left")
break
driver.quit()
I also tried this one:
try:
link = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.pagination-next a')))
driver.execute_script('arguments[0].scrollIntoView();', link)
link.click()
except:
keep_going = False
I've tried putting print statements in and it just keeps staying on the last page.
Here is the HTML for the first page/last page for the next button, I'm not sure if I could do something utilizing this:
HTML for first page:
<li role="menuitem" ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next" style="">Next</li>
Next
</li>
HTML for last page:
<li role="menuitem" ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next disabled" style="">Next</li>
Next
</li>
You can solve the problem as below,
Next Button Will be enabled until the Last Page and it will be disabled in the Last Page.
So, you can create two list to find the enabled button element and disabled button element. At any point of time,either enabled element list or disabled element list size will be one.So, If the element is disabled, then you can break the while loop else click on the next button.
I am not familiar with python syntax.So,You can convert the below java code and then use it.It will work for sure.
Code:
boolean hasNextPage=true;
while(hasNextPage){
List<WebElement> enabled_next_page_btn=driver.findElements(By.xpath("//li[#class='pagination-next']/a"));
List<WebElement> disabled_next_page_btn=driver.findElements(By.xpath("//li[#class='pagination-next disabled']/a"));
//If the Next button is enabled/available, then enabled_next_page_btn size will be one.
// So,you can perform the click action and then do the action
if(enabled_next_page_btn.size()>0){
enabled_next_page_btn.get(0).click();
hasNextPage=true;
}else if(disabled_next_page_btn.size()>0){
System.out.println("No more Pages Available");
break;
}
}
The next_page_btn.index(0).click() wasn't working, but checking the len of next_page_btn worked to find if it was the last page, so I was able to do this.
while True:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
if len(next_page_btn) < 1:
print("No more pages left")
break
else:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
Thanks so much for the help!
How about using a do/while loop and just check for the class "disabled" to be included in the attributes of the next button to exit out? (Excuse the syntax. I just threw this together and haven't tried it)
string classAttribute
try :
do
{
IWebElement element = driver.findElement(By.LINK_TEXT("Next"))
classAttribute = element.GetAttribute("class")
element.click()
}
while(!classAttribute.contains("disabled"))
except :
pass
driver.quit()
xPath to the button is:
//li[#class = 'pagination-next']/a
so every time you need to load next page you can click on this element:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
next_page_btn.index(0).click()
Note: you should add a logic:
while True:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
if len(next_page_btn) < 1:
print("No more pages left")
break
else:
# do stuff
Related
I am trying to loop through the pages and print the values for the table but it can't click the next button.
Error:
Selenium.common.exceptions.ElementClickInterceptedException: Message:
Element <a class="page-link"
href="javascript:move('generic-tokenholders2?a=0xB8c77482e45F1F44dE1
745F52C74426C631bDD52&sid=&m=normal&s=16579517055253348798759097&p=2')">
is not clickable at point (1148,2553) because another element <div id="overlay">
obscures it`
Page: https://etherscan.io/token/0xB8c77482e45F1F44dE1745F52C74426C631bDD52#balances
My Code:
driver.get("https://etherscan.io/token/0xB8c77482e45F1F44dE1745F52C74426C631bDD52#balances")
wait = WebDriverWait(driver,30)
# num=driver.find_element(By.CSS_SELECTOR, "/html/body/div[2]/div[3]/div/div/ul/li[3]/span/strong[2]").getText()
for i in range(1,20):
time.sleep(5)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID,"tokeholdersiframe")))
print("udj")
simpleTable = driver.find_element(By.XPATH,"/html/body/div[2]/div[3]/table")
rows = driver.find_elements(By.TAG_NAME,"tr")
for i in range(1,len(rows)):
cols = rows[i].find_elements(By.TAG_NAME,"td")
for g in cols:
print(g.text)
next = wait.until(EC.element_to_be_clickable((By.XPATH,"//div[#class='d-inline-block']//a[#aria-label='Next']")))
driver.execute_script("arguments[0].scrollIntoView(true);",next)
driver.execute_script("window.scrollBy(0,-200);")
next.click()
driver.switch_to.default_content()
Update:
Error pmadhu:
`
If you observe in the loop you are switching to a iframe performing some actions and clicking on Next.
When the 2nd page opened the scope is still on the previous page iframe and you are trying to find an iframe within that. Which is not the correct workflow.
You need to perform switch_to.default_content() after clicking on Next and then try to do the same on each page.
Below code did click on the Next without any exception:
driver.get("https://etherscan.io/token/0xB8c77482e45F1F44dE1745F52C74426C631bDD52#balances")
wait = WebDriverWait(driver,30)
for i in range(5):
time.sleep(5)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID,"tokeholdersiframe")))
next = wait.until(EC.element_to_be_clickable((By.XPATH,"//div[#class='d-inline-block']//a[#aria-label='Next']")))
driver.execute_script("arguments[0].scrollIntoView(true);",next)
driver.execute_script("window.scrollBy(0,-200);")
next.click()
driver.switch_to.default_content()
I am trying to scrape a long list of books in 10 web pages. When the loop clicks on next > button for the first time the website displays a login overlay so selenium can not find the target elements.
I have tried all the possible solutions:
Use some chrome options.
Use try-except to click X button on the overlay. But it appears only one time (when clicking next > for the first time). The problem is that when I put this try-except block at the end of while True: loop, it became infinite as I use continue in except as I do not want to break the loop.
Add some popup blocker extensions to Chrome but they do not work when I run the code although I add the extension using options.add_argument('load-extension=' + ExtensionPath).
This is my code:
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('disable-avfoundation-overlays')
options.add_argument('disable-internal-flash')
options.add_argument('no-proxy-server')
options.add_argument("disable-notifications")
options.add_argument("disable-popup")
Extension = (r'C:\Users\DELL\AppData\Local\Google\Chrome\User Data\Profile 1\Extensions\ifnkdbpmgkdbfklnbfidaackdenlmhgh\1.1.9_0')
options.add_argument('load-extension=' + Extension)
options.add_argument('--disable-overlay-scrollbar')
driver = webdriver.Chrome(options=options)
driver.get('https://www.goodreads.com/list/show/32339._50_?page=')
wait = WebDriverWait(driver, 2)
review_dict = {'title':[], 'author':[],'rating':[]}
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
prod_containers = html_soup.find_all('table', class_ = 'tableList js-dataTooltip')
while True:
table = driver.find_element_by_xpath('//*[#id="all_votes"]/table')
for product in table.find_elements_by_xpath(".//tr"):
for td in product.find_elements_by_xpath('.//td[3]/a'):
title = td.text
review_dict['title'].append(title)
for td in product.find_elements_by_xpath('.//td[3]/span[2]'):
author = td.text
review_dict['author'].append(author)
for td in product.find_elements_by_xpath('.//td[3]/div[1]'):
rating = td.text[0:4]
review_dict['rating'].append(rating)
try:
close = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div/div/div[1]/button')))
close.click()
except NoSuchElementException:
continue
try:
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'next_page')))
element.click()
except TimeoutException:
break
df = pd.DataFrame.from_dict(review_dict)
df
Any help like if I can change the loop to for loop clicks next > button until the end rather than while loop or where should I put try-except block to close the overlay or if there is Chromeoption can disable overlay.
Thanks in advance
Thank you for sharing your code and the website that you are having trouble with. I was able to close the Login Modal by using xpath. I took this challenge and broke up the code using class objects. 1 object is for the selenium.webdriver.chrome.webdriver and the other object is for the page that you wanted to scrape the data against ( https://www.goodreads.com/list/show/32339 ). In the following methods, I used the Javascript return arguments[0].scrollIntoView(); method and was able to scroll to the last book that displayed on the page. After I did that, I was able to click the next button
def scroll_to_element(self, xpath : str):
element = self.chrome_driver.find_element(By.XPATH, xpath)
self.chrome_driver.execute_script("return arguments[0].scrollIntoView();", element)
def get_book_count(self):
return self.chrome_driver.find_elements(By.XPATH, "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr").__len__()
def click_next_page(self):
# Scroll to last record and click "next page"
xpath = "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr[{0}]".format(self.get_book_count())
self.scroll_to_element(xpath)
self.chrome_driver.find_element(By.XPATH, "//div[#id='all_votes']//div[#class='pagination']//a[#class='next_page']").click()
Once I clicked on the "Next" button, I saw the modal display. I was able to find the xpath for the modal and was able to close the modal.
def is_displayed(self, xpath: str, int = 5):
try:
webElement = DriverWait(self.chrome_driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
return True if webElement != None else False
except:
return False
def is_modal_displayed(self):
return self.is_displayed("//body[#class='modalOpened']")
def close_modal(self):
self.chrome_driver.find_element(By.XPATH, "//div[#class='modal__content']//div[#class='modal__close']").click()
if(self.is_modal_displayed()):
raise Exception("Modal Failed To Close")
I hope this helps you to solve your problem.
I faced one issue, before when I scrape multiple pages by Selenium, just use click next page button and use NoSuchElementException to stop it.
But the url I facing now is the element always exists, in the last page, if I click next page button, it just reload the current page.
Anyone Can help to solve how to stop click the same button?
items=driver.find_elements_by_class_name('item')
while True:
try:
#click next page
driver.find_element_by_link_text('下一页').click()
sleep(5)
#scrpae data here
items=driver.find_elements_by_class_name('item')
for i in range(0, len(items)):
results.append(items[i])
print(items[i])
except NoSuchElementException:
break
For the pages details you can check the picture below
Fullsize image
[Edited]
You can solve it by matching current page url and next page url in next page link.
if current page url matches the url in next page link then it is the last page. otherwise continue scraping.
You should have a variable where you store current page URL and when you click on next page link by selenium, you get the page url and match with previous.
This is what i am saying:
url = "https://humkinar.com.pk/"
driver.get(url)
items=driver.find_elements_by_class_name('item')
current_page_url = ""
prev_page_url = url
while True:
try:
driver.find_element_by_link_text('下一页').click()
current_page_url = driver.current_url
if current_page_url != prev_page_url:
time.sleep(5)
items=driver.find_elements_by_class_name('item')
for i in range(0, len(items)):
results.append(items[i])
print(items[i])
prev_page_url = current_page_url
else:
break
except NoSuchElementException:
break
As i see in picture (i suppose picture you shared is of last page),check for className == 'disable' in <a class='disable'> <some text in chinese></a> and break;
UPDATE:
items=driver.find_elements_by_class_name('item')
while True:
try:
#click next page
next = driver.find_element_by_link_text('下一页')
next.click()
sleep(5)
#scrpae data here
items=driver.find_elements_by_class_name('item')
for i in range(0, len(items)):
results.append(items[i])
print(items[i])
break;
if(next.getAttribute('class') == 'disable'){
throw new Exception()
}
I have been wondering how I could see when I am at the last page on an amazon listing. I have tried to get at the last page number on the bottom of the screen with nothing working, so I tried a different approach. To see if the 'Next' button can still be clicked. Here is what i have so far, any ideas on why it wont go to the next page?
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium import webdriver
import time
def next():
giveawayPage = 1
currentPageURL = 'https://www.amazon.com/ga/giveaways?pageId=' + str(giveawayPage)
while True:
try:
nextButton = driver.find_element_by_xpath('//*[#id="giveawayListingPagination"]/ul/li[7]')
except:
nextPageStatus = 'false'
print('false')
else:
nextpageStatus = 'true'
giveawayPage = giveawayPage + 1
driver.get(currentPageURL)
if (nextPageStatus == 'false'):
break
if __name__ == '__main__':
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.amazon.com/ga/giveaways?pageId=1')
next()
The reason this doesn't work is because if you go to the last page of an Amazon Giveaway, the element that you're selecting is still there, it's just not clickable. On most pages, the element looks like:
<li class="a-last">...</li>
On the last page, it looks instead like:
<li class="a-disabled a-last">...</li>
So rather than checking if the element exists, it might be better to check if the element has the class 'a-disabled'.
the li for Next button is always exist, there are several ways to check if it last page:
check if the li has multiple class or not only a-last but also has class a-disabled
//li[#class="a-last"]
check if the <a> element exist in the li
//*[#id="giveawayListingPagination"]/ul/li[7]/a
I'm new to selenium and webscraping and I'm trying to get information from the link: https://www.carmudi.com.ph/cars/civic/distance:50km/?sort=suggested
Here's a snippet of the code I'm using:
while max_pages > 0:
results.extend(extract_content(driver.page_source))
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
max_pages -= 1
When I try to print results, I always get (max_pages) of the same results from page 1. The "Next page" button is visible in the page and when I try to find elements of the same class, it only shows 1 element. When I try getting the element by the exact xpath and performing the click action on it, it doesn't work as well. I enclosed it in a try-except block but there were no errors. Why might this be?
You are making this more complicated than it needs to be. There's no point in using JS clicks here... just use the normal Selenium clicks.
while True:
# do stuff on the page
next = driver.find_element_by_css_selector("a[title='Next page']")
if next
next.click()
else
break
replace:
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
with:
driver.execute_script('next = document.querySelector(".next-page"); next.click();')
If you try next = document.querySelector(".next-page"); next.click(); in console you can see it works.