Selenium click on a next-page link not loading the next page - python

I'm new to selenium and webscraping and I'm trying to get information from the link: https://www.carmudi.com.ph/cars/civic/distance:50km/?sort=suggested
Here's a snippet of the code I'm using:
while max_pages > 0:
results.extend(extract_content(driver.page_source))
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
max_pages -= 1
When I try to print results, I always get (max_pages) of the same results from page 1. The "Next page" button is visible in the page and when I try to find elements of the same class, it only shows 1 element. When I try getting the element by the exact xpath and performing the click action on it, it doesn't work as well. I enclosed it in a try-except block but there were no errors. Why might this be?

You are making this more complicated than it needs to be. There's no point in using JS clicks here... just use the normal Selenium clicks.
while True:
# do stuff on the page
next = driver.find_element_by_css_selector("a[title='Next page']")
if next
next.click()
else
break

replace:
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
with:
driver.execute_script('next = document.querySelector(".next-page"); next.click();')
If you try next = document.querySelector(".next-page"); next.click(); in console you can see it works.

Related

How to update a function's input by its own output

I am scraping a web page with a table that has multiple pages. I have a function that finds the next page button and hits it. The function needs to return to the main table page to do that. I have the link to that main table page hardcoded in a variable.
Once I moved to e.g page number 2, how do I update that table page link to the new page link? so that once it's done going inside elements of the table, it'll go back to the 2nd-page link and move on from there, not the first page.
def nextPage(driver, desiredPage):
driver.get(desiredPage)
time.sleep(55)
button = driver.find_element_by_xpath(XPATH WRITTEN HERE)
driver.execute_script("arguments[0].click();", button)
time.sleep(45)
next_page = driver.current_url
return next_page
You can use a while loop, to call the method again and again. You may choose a condition on page_nb or another variable to count, to avoid infinite code to run
page_nb = 1
while page_nb != 99:
page_nb = nextPage(driver, page_nb)

Clear search bar using selenium

Im using selenium to check if FB pages exist. When i enter the page title in the search bar it works fine but after the second loop the name of the page gets attached to the preview search and i cant find a way to clear the previous search.
For example it looks for
xyz for the first time
then it looks for
xyzabc when i just want to look for abc this time.
How can i clear the search bar so i can just enter the input without the previous input?
Here is my code
for page_target in df.page_name.values:
time.sleep(3)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys(page_target)
inputElement.submit()
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser').get_text()
title = soup.find(page_target)
#if page exists add 1 to the dic otherwise -1
if title > 0:
dic_holder[page_target] = 1
else:
dic_holder[page_target] = -1
driver.find_element_by_name("q").clear()
time.sleep(3)
You can use
WebElement.clear();//to clear the previous search item
WebElement.sendkeys(abc);//to insert the new search
Also I guess you have a sticky search in your application hence I recommend you to use this method everytime you insert something in the searchbox
Few ways to do it:
Use element.clear(). I see that you already tried in your code, not sure how it didn't work but I guess it is not text box or input element?
Use javascript: driver.execute_script('document.getElementsByName("q")[0].value=""');
Emulate Ctrl+A?
from selenium.webdriver.common.keys import Keys
elem.send_keys(Keys.CONTROL, 'a')
elem.send_keys("page 1")

Selenium clicking to next page until on last page

I am trying to keep clicking to next page on this website, each time appending the table data to a csv file and then when I reach the last page, append the table data and break the while loop
Unfortunately, for some reason it keeps staying on the last page, and I have tried several different methods to catch the error
while True:
try :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
except :
print("No more pages left")
break
driver.quit()
I also tried this one:
try:
link = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.pagination-next a')))
driver.execute_script('arguments[0].scrollIntoView();', link)
link.click()
except:
keep_going = False
I've tried putting print statements in and it just keeps staying on the last page.
Here is the HTML for the first page/last page for the next button, I'm not sure if I could do something utilizing this:
HTML for first page:
<li role="menuitem" ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next" style="">Next</li>
Next
</li>
HTML for last page:
<li role="menuitem" ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next disabled" style="">Next</li>
Next
</li>
You can solve the problem as below,
Next Button Will be enabled until the Last Page and it will be disabled in the Last Page.
So, you can create two list to find the enabled button element and disabled button element. At any point of time,either enabled element list or disabled element list size will be one.So, If the element is disabled, then you can break the while loop else click on the next button.
I am not familiar with python syntax.So,You can convert the below java code and then use it.It will work for sure.
Code:
boolean hasNextPage=true;
while(hasNextPage){
List<WebElement> enabled_next_page_btn=driver.findElements(By.xpath("//li[#class='pagination-next']/a"));
List<WebElement> disabled_next_page_btn=driver.findElements(By.xpath("//li[#class='pagination-next disabled']/a"));
//If the Next button is enabled/available, then enabled_next_page_btn size will be one.
// So,you can perform the click action and then do the action
if(enabled_next_page_btn.size()>0){
enabled_next_page_btn.get(0).click();
hasNextPage=true;
}else if(disabled_next_page_btn.size()>0){
System.out.println("No more Pages Available");
break;
}
}
The next_page_btn.index(0).click() wasn't working, but checking the len of next_page_btn worked to find if it was the last page, so I was able to do this.
while True:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
if len(next_page_btn) < 1:
print("No more pages left")
break
else:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
Thanks so much for the help!
How about using a do/while loop and just check for the class "disabled" to be included in the attributes of the next button to exit out? (Excuse the syntax. I just threw this together and haven't tried it)
string classAttribute
try :
do
{
IWebElement element = driver.findElement(By.LINK_TEXT("Next"))
classAttribute = element.GetAttribute("class")
element.click()
}
while(!classAttribute.contains("disabled"))
except :
pass
driver.quit()
xPath to the button is:
//li[#class = 'pagination-next']/a
so every time you need to load next page you can click on this element:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
next_page_btn.index(0).click()
Note: you should add a logic:
while True:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
if len(next_page_btn) < 1:
print("No more pages left")
break
else:
# do stuff

python selenium firefox behavior

I am using a firefox browser with selenium. I am scraping a website that has multiple pages like google search, where you can pick the page at the bottom. On each page, I click an element, like google again, and scrap data from that element's information. If I am at an element's information on the third page, and click the back button using my regular firefox browser, it goes back to the third page. But, when I press the back button in selenium with driver.back(), it takes me back to the first page. Anyone know how to fix this?
count = 1
while 1:
try:
pages = driver.find_elements_by_css_selector("a.page-number.gradient")
except:
break
for page in pages:
if page.text==str(count):
page.click()
print count
break
states = driver.find_elements_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr/td[19]")
fails = []
i = 1
for state in states:
if state.text == "FAILED":
fails.append(i)
i+=1
for fail in fails:
print driver.find_element_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr[" + str(fail) + "]/td[19]").text
driver.find_element_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr[" + str(fail) + "]/td[1]/input").click()
time.sleep(2)
errors = driver.find_element_by_name("errors")
if "\n" in errors.text:
fixedText = errors.text.split("\n")[0]
errors.clear()
errors.send_keys(fixedText)
time.sleep(1)
driver.find_element_by_name('post_type').click()
time.sleep(5)
driver.switch_to_alert().accept()
driver.switch_to_alert().accept()
driver.back()
driver.back()
else:
driver.back()
driver.switch_to_alert().accept()
driver.switch_to_alert().accept()
count+=1
The code is really complicated, but basically it's the driver.back() lines that aren't working

Selenium Python - Access next pages of search results

I have to click on each search result one by one from this url:
Search Guidelines
I first extract the total number of results from the displayed text so that I can set the upper limit for iteration
upperlimit=driver.find_element_by_id("total_results")
number = int(upperlimit.text.split(' ')[0])
The loop is then defiend as
for i in range(1,number):
However, after going through the first 10 results on the first page, list index goes out of range (probably because there are no more links to click). I need to click on "Next" to get the next 10 results, and so on till I'm done with all search results. How can I go around doing that?
Any help would be appreciated!
The problem is that the value of element with id total_results changes after the page is loaded, at first it contains 117, then changes to 44.
Instead, here is a more robust approach. It processes page by page until there is no more pages left:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Firefox()
url = 'http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true#/search/?searchText=bevacizumab&mode=&staticTitle=false&SEARCHTYPE_all2=true&SEARCHTYPE_all1=&SEARCHTYPE=GUIDANCE&TOPICLVL0_all2=true&TOPICLVL0_all1=&HIDEFILTER=TOPICLVL1&HIDEFILTER=TOPICLVL2&TREATMENTS_all2=true&TREATMENTS_all1=&GUIDANCETYPE_all2=true&GUIDANCETYPE_all1=&STATUS_all2=true&STATUS_all1=&HIDEFILTER=EGAPREFERENCE&HIDEFILTER=TOPICLVL3&DATEFILTER_ALL=ALL&DATEFILTER_PREV=ALL&custom_date_from=&custom_date_to=11-06-2014&PAGINATIONURL=%2FSearch.do%3FsearchText%40%40bevacizumab%26newsearch%40%40true%26page%40%40&SORTORDER=BESTMATCH'
driver.get(url)
page_number = 1
while True:
try:
link = driver.find_element_by_link_text(str(page_number))
except NoSuchElementException:
break
link.click()
print driver.current_url
page_number += 1
Basically, the idea here is to get the next page link, until there is no such ( NoSuchElementException would be thrown). Note that it would work for any number of pages and results.
It prints:
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=1
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=2#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=3#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=4#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=5#showfilter
There is not even the need to programatically press on the Next button, if you see carrefully, the url just needs a new parameter when browsing other result pages:
url = "http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page={}#showfilter"
for i in range(1,5):
driver.get(url.format(i))
upperlimit=driver.find_element_by_id("total_results")
number = int(upperlimit.text.split(' ')[0])
if you still want to programatically press on the next button you could use:
driver.find_element_by_class_name('next').click()
But I haven't tested that.

Categories