Selenium Explicit Wait for Pagination - python

I'm using Selenium to navigate pages in a scraping project. This is the HTML:
<input type="hidden" id="day_nr" value="2"/>
<div id="js-table" class="js-table table">
Day 2 of 2
<div class="js-pager">
<input id="myCustomUrl" name="myCustomUrl" type="hidden" value="/Ranking/Rankings"/>
<div class="pagination-container">
<ul class="pagination">
<li class=""><a class="days" id="day_1">Day 1</a></li>
<li class="active"><a class="days" id="day_2">Day 2</a></li>
</ul>
When I click to go to the next page, I need to wait a few seconds before calling the scrape function otherwise the table won't be loaded and I'll simply scrape data from the previous page. It seems to me I should be able to do this in 3 ways:
1) Using the input element <input type="hidden" id="day_nr" value="2"/>:
element = wait.until(EC.text_to_be_present_in_element_value((By.ID, 'day_nr'), '2'))
2) Using the div element just below that <div id="js-table" class="js-table table">
Day 2 of 2:
element = wait.until(EC.text_to_be_present_in_element((By.ID, 'js-table'), 'Day 2 of 2'))
3) Using the the list element with "active" class <li class="active"><a class="days" id="day_2">Day 2</a></li>:
element = wait.until(EC.text_to_be_present_in_element((By.CLASS_NAME, 'active'), 'Day 2'))
All of these run without any error yet the program is still scraping the data from the first page rather than the second. Alternatively, I've created a while loop to make the program sleep until the element with the "active" class matches the correct day I intend to scrape; this works just fine but it would be a lot cleaner if I could get the explicit wait to work.
Any ideas what I'm doing wrong?

The best way I've found to do this is to use wait for stale. A stale element is one that is no longer attached to the DOM of the page. For example, you'll get a stale element exception if you locate an element on the page and store it in a variable, click something that navigates to a new page or reloads the current page, and then try to interact with the variable you declared earlier.
You can use that to let you know when the page has reloaded by finding and storing an element on the page, navigate to the next page, wait for the element to go stale, and then continue the script. It would look something like this.
e = driver.find_element((By.ID, 'day_nr')); # grab an element from the current page
something.click() # navigate to new page
wait.until(EC.staleness_of(e)); # once e is stale, you know you are loading the new page
# now you are ready to scrape the next page
...do stuff
For more info, see the docs or the apis

Related

How do I click and open a collection of objects in an element in python selenium without closing and opening the browser for each element

Say I have an eccormece site I would want to scrape and I am interested in the top ten trending products and when dig into the html element its like this:
<div>
<div>
<span>
<a href='www.mysite/products/1'>
Product 1
</a>
</spa>
</div>
<div>
<span>
<a href='www.mysite/products/2'>
Product 2
</a>
</spa>
</div>
<div>
<span>
<a href='www.mysite/products/3'>
Product 3
</a>
</spa>
</div>
<div>
<span>
<a href='www.mysite/products/4'>
Product 4
</a>
</spa>
</div>
</div>
My first solution was to extract the href attributes and then store them in a list then I would open browser instances for each and every attribute, but then it comes at a cost as I have to close and open the browser and every time I open it I have to authenticate. I then tried solution 2. In my solution two the outer div is the parent and as per selenium way of doing things it would mean that products I stored as follows:
product_1 = driver.find_element_by_xpath("//div/div[1]")
product_2 = driver.find_element_by_xpath("//div/div[2]")
product_3 = driver.find_element_by_xpath("//div/div[3]")
product_4 = driver.find_element_by_xpath("//div/div[4]")
So my objective would is to search for a product and after getting the list target the box's a tag and then click it, go to extract more details on the product and then go back without closing the browser till my list is finished and below is my solution:
for i in range(10):
try:
num = i + 1
path = f"//div/div[{num}]/span/a"
poduct_click = driver.find_element_by_xpath(path)
driver.execute_script("arguments[0].click();", poduct_click)
scrape_product_detail() #function that scrapes the whole product detail
driver.execute_script("window.history.go(-1)") # goes backwards to continue looping
except NoSuchElementException:
print('Element not found')
The problem is it works for the first product and it scrapes all the detail and then it goes back. Despite going back to the product page the program fails to find the second element and those coming afterwards and I am failing to understand what may be the problem. May you kindly assist. Thanks
thanks #Debenjan you did help me a lot there. Your solution is working like a charm. For those who would want to know how I went about here is the following code:
article_elements = self.find_elements_by_class_name("s-card-image")
collection = []
for news_box in article_elements:
# Pulling the hotel name
slug = news_box.find_element_by_tag_name(
'a'
).get_attribute('href')
collection.append(
slug
)
for i in range(len(collection)):
self.execute_script("window.open()")
self.switch_to.window(self.window_handles[i+1])
url = collection[i]
self.get(url)
print(self.title, url, self.current_url)
#A D thanks so much your solution is working too and I just will have to test and see whats the best strategy and go with it. Thanks a lot guys

Python & Selenium - Find element by label text

Im trying to locate and click an element (checkbox) from a big selection of checkboxes on a html site using python and selenium webdriver. HTML code looks like this:
HTML Code
<div class="checkbox-inline col-md-5 col-lg-3 col-sm-6 m-l-sm rightCheckBox">
<input type="checkbox" checked="checked" class="i-checks" name="PanelsContainer:tabsContentView:5:listTabs:rights-group-container:right-type-view:2:right-view:2:affected-right" disabled="disabled" id="id199"> <label>Delete group</label>
</div>
My problem is that the only unique identifier is:
<label>Delete group</label>
All other elements/id's/names are used by other checkboxes or changes from page to page.
I have tried the following code:
driver.find_element_by_xpath("//label[contains(text(), 'Delete group')]").click()
But I only get error when using this.
Error: element not interactable
Anyone able to help with this?
Try the below xpath
//label[contains(text(), 'Delete group')]//ancestor::div//input
Try with Javascript.
checkBox = driver.find_element_by_xpath("//label[text()='Delete group']//ancestor::div//input")
# Scroll to checkbox if its not in screen
driver.execute_script("arguments[0].scrollIntoView();", checkBox)
driver.execute_script("arguments[0].click();", checkBox)
Note : As per HTML shared by you, checkbox is in Disabled state, so i am not sure click will trigger any action. However above code will click your checkbox.

How can I access bulk-edit button "bulkedit_all"? Python / Selenium

I am trying to automate JIRA tasks but struggling to access bulkedit option after JQL filter. After accessing the correct sceen I am stuck at this point:
enter image description here
HTML code:
<div class="aui-list">
<h5>Bulk Change:</h5>
<ul class="aui-list-sectionaui-first aui-last">
<li class="aui-list-item active">
<a class="aui-list-item-link" id="bulkedit_all" href="/secure/views/bulkedit/BulkEdit1!default.jspa?reset=true&tempMax=4">all 4 issue(s)</a>
</li>
</ul>
</div>
My Python code:
bulkDropdown = browser.find_elements_by_xpath("//div[#class='aui-list']//aui-list[#class='aui-list-item.active']").click()
Try the following xpath -
bulkDropdown = browser.find_elements_by_xpath("//li/a[#id='bulkedit_all']").click()
The link you want has an ID, you should use that unless you find that it's not unique on the page.
browser.find_element_by_id("bulkedit_all").click()
You will likely need to add a wait for clickable since from the screenshot it looks like a popup or tooltip of some kind. See the docs for more info on the different waits available.

How to crawl a web page using selenium - find_element_by_link_text

I'm trying to use Selenium and BeautifulSoup to "click" a javascript.void. The return of find_element_by_link_text is not NULL. However, nothing is updated by reviewing browser.page_source. I am not sure if crawling is success or not
Here is the result using
PageTable = soup.find('table',{'id':'rzrqjyzlTable'})
print(PageTable)
<table class="tab1" id="rzrqjyzlTable">
<div id="PageNav" class="PageNav" style="">
<div class="Page" id="PageCont">
Previous3<span class="at">1</span>
2
3
4
5
...
45
Next Page
<span class="txt"> Jump</span><input class="txt" id="PageContgopage">
<a class="btn_link">Go</a></div>
</div>
The code for clicking next page is shown below
try:
page = browser.find_element_by_link_text(u'Next Page')
page.click()
browser.implicitly_wait(3)
except NoSuchElementException:
print("NoSuchElementException")
soup = BeautifulSoup(browser.page_source, 'html.parser')
PageTable = soup.find('table',{'id':'rzrqjyzlTable'})
print(PageTable )
I am expecting that browser.page_source should be updated
My guess is that you are pulling the source before the page (or subpage) is done reloading. I would try grabbing the Next Page button, click it, wait for it to go stale (indicates the page is reloading), and then try pulling the source.
page = browser.find_element_by_link_text(u'Next Page')
page.click()
wait.until(EC.staleness_of(page))
# the page should be loading/loaded at this point
# you may need to wait for a specific element to appear to ensure that it's loaded properly since it doesn't seem to be a full page load
After clicking on Next page, you can reload the web page.
Code :
driver.refresh()
Or using Java script executor :
driver.execute_script("location.reload()")
after that you try to get the page source like the way you are doing.
Hope this will help.

Python Selenium: Looping through href's embedded in <li> elements

I am parsing a web page with an organization like this:
<nav class="sidebar-main">
<div class="sidebar">Found 3 targets</div>
<ul><li><span>target1</span></li>
<li><a href="#target2" ><span>target2</span></a></li>
<li><span>target3</span></li></ul>
</nav>
My goal is to loop through each list element, clicking each one in the process:
sidebar = browser.find_element_by_class_name('sidebar-main')
elementList = sidebar.find_elements_by_tag_name("li")
for sample in elementList:
browser.implicitly_wait(5)
run_test1 = WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.CLASS_NAME, 'sidebar-main'))
)
sample.click()
I keep getting the error:
Message: The element reference of <li> stale either the element is no
longer attached to the DOM or the page has been refreshed.
Right now only one link is clicked, obviously selenium cannot locate subsequent elements upon page refresh, how do I get around this?
Once you click on the first link, either navigation to new page happens or the page is refreshed. You need to keep track of the element list, find the list elements again and then click on the required element. If page is changed, then you need to navigate back to the original page as well.
You can try something like below
sidebar = browser.find_element_by_class_name('sidebar-main')
elementList = sidebar.find_elements_by_tag_name("li")
for i in range(len(elementList)):
element = browser.find_element_by_class_name('sidebar-main').find_elements_by_tag_name("li")[i]
element.click()

Categories