Python Selenium: Looping through href's embedded in <li> elements

Python Selenium: Looping through href's embedded in <li> elements - python

I am parsing a web page with an organization like this:
<nav class="sidebar-main">
<div class="sidebar">Found 3 targets</div>
<ul><li><span>target1</span></li>
<li><a href="#target2" ><span>target2</span></a></li>
<li><span>target3</span></li></ul>
</nav>
My goal is to loop through each list element, clicking each one in the process:
sidebar = browser.find_element_by_class_name('sidebar-main')
elementList = sidebar.find_elements_by_tag_name("li")
for sample in elementList:
browser.implicitly_wait(5)
run_test1 = WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.CLASS_NAME, 'sidebar-main'))
)
sample.click()
I keep getting the error:
Message: The element reference of <li> stale either the element is no
longer attached to the DOM or the page has been refreshed.
Right now only one link is clicked, obviously selenium cannot locate subsequent elements upon page refresh, how do I get around this?

Once you click on the first link, either navigation to new page happens or the page is refreshed. You need to keep track of the element list, find the list elements again and then click on the required element. If page is changed, then you need to navigate back to the original page as well.
You can try something like below
sidebar = browser.find_element_by_class_name('sidebar-main')
elementList = sidebar.find_elements_by_tag_name("li")
for i in range(len(elementList)):
element = browser.find_element_by_class_name('sidebar-main').find_elements_by_tag_name("li")[i]
element.click()

Related

Selenium: Unable to locate element by class and id

Trying to scrape a website, I created a loop and was able to locate all the elements. My problem is, that the next button id changes on every page. So I can not use the id as a locator.
This is the next button on page 1:
<a rel="nofollow" id="f_c7" href="#" class="nextLink jasty-link"></a>
And this is the next button on page 2:
<a rel="nofollow" id="f_c9" href="#" class="nextLink jasty-link"></a>
Idea:
next_button = browser.find_elements_by_class_name("nextLink jasty-link")
next_button.click
I get this error message:
Message: no such element: Unable to locate element
The problem here might be that there are two next buttons on the page.
So I tried to create a list but the list is empty.
next_buttons = browser.find_elements_by_class_name("nextLink jasty-link")
print(next_buttons)
Any idea on how to solve my problem? Would really appreciate it.
This is the website:
https://fazarchiv.faz.net/faz-portal/faz-archiv?q=Kryptow%C3%A4hrungen&source=&max=10&sort=&offset=0&_ts=1657629187558#hitlist

There are two issues in my opinion:
Depending from where you try to access the site there is a cookie banner that will get the click, so you may have to accept it first:
browser.find_element_by_class_name('cb-enable').click()
To locate a single element, one of the both next buttons, it doeas not matter, use browser.find_element() instead of browser.find_elements().
Selecting your element by multiple class names use xpath:
next_button = browser.find_element(By.XPATH, '//a[contains(#class, "nextLink jasty-link")]')
or css selectors:
next_button = browser.find_element(By.CSS_SELECTOR, '.nextLink.jasty-link')
Note: To avoid DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() import in addition from selenium.webdriver.common.by import By

You can't get elements by multiple class names. So, you can use find_elements_by_css_selector instead.
next_buttons = browser.find_elements_by_css_selector(".nextLink.jasty-link")
print(next_buttons)
You can then loop through the list and click the buttons:
next_buttons = browser.find_elements_by_css_selector(".nextLink.jasty-link")
for button in next_buttons:
button.click()

Try below xPath
//a[contains(#class, 'step jasty-link')]/following-sibling::a

selenium find element with tag started with "#"

I want to scrape web and the structure looks like:
<iframe>
#document
<html>
......
</html>
</iframe>
I need to step into "html" and click the button but I can't find a way to go inside.
Is there any method to click button inside the "#tag_name"?

In order to access elements inside an iframe you have to switch to that iframe.
In case this is the only iframe on the page it can be done with:
driver.switch_to.frame(driver.find_element_by_xpath('//iframe'))
In case there are several iframes there you should locate them similarly to the other web elements

This is very similar to how we scrape the DOM for other tags.
<html>
<body>
....
</body>
</html>
We usually write it like "//div[#tag-name="<<value>>"]". Very similarly, we perform for frame as well.
Instead of writing it direct to access the elements, we switch to the frame and then access as below
driver.switch_to_frame("<<frame id or frame name>>")
driver.find_element(By.xpath, '//div[#tag-name="<<value>>"]').click()
driver.switch_to.default_content()
OR
driver.switch_to_frame("<<frame id or frame name>>")
element = driver.find_element_by_xpath("//div[#tag-name="<<value>>"]")
element.click()
driver.switch_to.default_content()

Selenium Explicit Wait for Pagination

I'm using Selenium to navigate pages in a scraping project. This is the HTML:
<input type="hidden" id="day_nr" value="2"/>
<div id="js-table" class="js-table table">
Day 2 of 2
<div class="js-pager">
<input id="myCustomUrl" name="myCustomUrl" type="hidden" value="/Ranking/Rankings"/>
<div class="pagination-container">
<ul class="pagination">
<li class=""><a class="days" id="day_1">Day 1</a></li>
<li class="active"><a class="days" id="day_2">Day 2</a></li>
</ul>
When I click to go to the next page, I need to wait a few seconds before calling the scrape function otherwise the table won't be loaded and I'll simply scrape data from the previous page. It seems to me I should be able to do this in 3 ways:
1) Using the input element <input type="hidden" id="day_nr" value="2"/>:
element = wait.until(EC.text_to_be_present_in_element_value((By.ID, 'day_nr'), '2'))
2) Using the div element just below that <div id="js-table" class="js-table table">
Day 2 of 2:
element = wait.until(EC.text_to_be_present_in_element((By.ID, 'js-table'), 'Day 2 of 2'))
3) Using the the list element with "active" class <li class="active"><a class="days" id="day_2">Day 2</a></li>:
element = wait.until(EC.text_to_be_present_in_element((By.CLASS_NAME, 'active'), 'Day 2'))
All of these run without any error yet the program is still scraping the data from the first page rather than the second. Alternatively, I've created a while loop to make the program sleep until the element with the "active" class matches the correct day I intend to scrape; this works just fine but it would be a lot cleaner if I could get the explicit wait to work.
Any ideas what I'm doing wrong?

The best way I've found to do this is to use wait for stale. A stale element is one that is no longer attached to the DOM of the page. For example, you'll get a stale element exception if you locate an element on the page and store it in a variable, click something that navigates to a new page or reloads the current page, and then try to interact with the variable you declared earlier.
You can use that to let you know when the page has reloaded by finding and storing an element on the page, navigate to the next page, wait for the element to go stale, and then continue the script. It would look something like this.
e = driver.find_element((By.ID, 'day_nr')); # grab an element from the current page
something.click() # navigate to new page
wait.until(EC.staleness_of(e)); # once e is stale, you know you are loading the new page
# now you are ready to scrape the next page
...do stuff
For more info, see the docs or the apis

How to crawl a web page using selenium - find_element_by_link_text

I'm trying to use Selenium and BeautifulSoup to "click" a javascript.void. The return of find_element_by_link_text is not NULL. However, nothing is updated by reviewing browser.page_source. I am not sure if crawling is success or not
Here is the result using
PageTable = soup.find('table',{'id':'rzrqjyzlTable'})
print(PageTable)
<table class="tab1" id="rzrqjyzlTable">
<div id="PageNav" class="PageNav" style="">
<div class="Page" id="PageCont">
Previous3<span class="at">1</span>
2
3
4
5
...
45
Next Page
<span class="txt"> Jump</span><input class="txt" id="PageContgopage">
<a class="btn_link">Go</a></div>
</div>
The code for clicking next page is shown below
try:
page = browser.find_element_by_link_text(u'Next Page')
page.click()
browser.implicitly_wait(3)
except NoSuchElementException:
print("NoSuchElementException")
soup = BeautifulSoup(browser.page_source, 'html.parser')
PageTable = soup.find('table',{'id':'rzrqjyzlTable'})
print(PageTable )
I am expecting that browser.page_source should be updated

My guess is that you are pulling the source before the page (or subpage) is done reloading. I would try grabbing the Next Page button, click it, wait for it to go stale (indicates the page is reloading), and then try pulling the source.
page = browser.find_element_by_link_text(u'Next Page')
page.click()
wait.until(EC.staleness_of(page))
# the page should be loading/loaded at this point
# you may need to wait for a specific element to appear to ensure that it's loaded properly since it doesn't seem to be a full page load

After clicking on Next page, you can reload the web page.
Code :
driver.refresh()
Or using Java script executor :
driver.execute_script("location.reload()")
after that you try to get the page source like the way you are doing.
Hope this will help.

python element is no longer attached to the dom selenium

I get "element is no longer attached to the dom" exception even though the element is there and is clickable, I am trying to click the "next" arrow on ryanair website, the html for the next button is:
<li class="newer" ng-class="{'loadingsmall':loading}">
<a ng-disabled="loading" ng-click="loadMore(0, 'SelectInput$LinkButtonNext1', 1)"
title="Next Week" href="">
</a>
</li>
I located and clicked it in several methods:
elem = WebDriverWait(browser, 15).until(EC.element_to_be_clickable((By.XPATH,
"//a[#title='Next Week']")))
elem = browser.find_element_by_xpath("//a[#title='Next Week']")
elem.click()
and:
area = browser.find_element_by_xpath("//a[#title='Next Week']")
action = webdriver.ActionChains(browser)
action.move_to_element(area) action.click(area) action.perform()
and:
elem = browser.find_element_by_link_text('>')
elem.click()
all work fine if I have no action in between, but once I tell selenium to click on other elements on the page (I do not move to other pages, I stay on the same page and show some dynamic content) the "next" link only works the first time around, and then gives me the exceptions, help would be so greatly appreciated! :)

You're still on the same page but the DOM's elements have changed due to some AJAX actions & calls, so you're forced to re-detect your object using:
elem = browser.find_element_by_link_text('>')
If you don't do that, it is more than likely that you'll get a
StaleReferenceException
For more info

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium: Looping through href's embedded in <li> elements - python

Related

Selenium: Unable to locate element by class and id

selenium find element with tag started with "#"

Selenium Explicit Wait for Pagination

How to crawl a web page using selenium - find_element_by_link_text

python element is no longer attached to the dom selenium

Categories

Resources