Python, Selenium + stale element reference - python

I'm trying to go on a webpage,
save a set of links of the page I would like to click on, and then
I would like to click on each of the link if a for loop (going back and forth on the page. Here is the code:
from selenium import webdriver
driver = webdriver.Chrome(executable_path='/Applications/chromedriver')
driver.get("webpage link") #insert link to webpage
list_links = driver.find_elements_by_xpath("//a[contains(#href,'activities')]")
for link in list_links:
print(link)
link.click()
driver.goback()
driver.implicitly_wait(10) # seconds
driver.quit()
However, the first time I go back to the homepage I get the error message:
StaleElementReferenceException: stale element reference: element is not attached to the page document.
Can anyone help me to understand why? Suggest a solution?
Thank you. much appreciated.

Your list_links works only on page where it was defined. After you made first click on link DOM re-created and references to elements of list_links becomes invalid. You can apply below solution:
driver.implicitly_wait(10) # seconds
list_links = [link.get_attribute('href') for link in driver.find_elements_by_xpath("//a[contains(#href,'activities')]")]
for link in list_links:
print(link)
driver.get(link)
driver.goback()
driver.quit()
P.S. I assume that goback() is your custom method that was defined already as there is no such method in Selenium built-ins, but just back()
P.P.S. Note that you can call driver.implicitly_wait(10) only once in your code and it will be applicable for all next find_element...() calls

Its simple, you trying to save the elements of an html(links) which cannot be referenced anymore by the code(the loop logic), thats why it throws this error. Most of all those are selenium objects you trying to save which you should not do. It should be that you save the exact value of the link in an array and then loop them.

Related

Find element in selenium based on attribute/value in div class

I'm using selenium in Python to try and scrape multiple pages. ID's and XPATH's keep changing per page, so I figured I'd best access them through their attribute-value combinations (see below).
I'm trying to access the text in the following element:
https://i.stack.imgur.com/ly1YU.png
which belongs to the following:
https://i.stack.imgur.com/strep.png
As I said, the ID's keep changing, so I wanted to access the element by data-fragment-name="articleDetail", or data-testid = "article-body". Can somebody help me how to do so?
Thanks in advance!
Try using the following CSS_SELECTOR
div[data-fragment-name='articleDetail'] div[data-testid='article-body']
Or XPath
//div[#data-fragment-name='articleDetail']//div[#data-testid='article-body']
The Selenium command can look like:
driver.find_element(By.CSS_SELECTOR, "div[data-fragment-name='articleDetail'] div[data-testid='article-body']")
Or
driver.find_element(By.XPATH, "//div[#data-fragment-name='articleDetail']//div[#data-testid='article-body']")
from selenium.webdriver.common.by import By
obj = driver.find_element(By.XPATH, "//div[#data-fragment-name='articleDetail']")
obj2 = driver.find_element(By.XPATH, "//div[#data-testid='article-body']")
where of course driver = webdriver.Firefox() or something like that and you already moved to the desired page.

I'm not sure why this xpath is unable to be located using Python Selenium

I'm trying to scrape every item on a site that's displayed in a grid format with infinite scrolling. However, I'm stuck on even getting the second item using xpath because it's saying:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='el-card advertisement card is-always-shadow'][2]"}
x = 1
while x < 5:
time.sleep(5)
target = driver.find_element_by_xpath(f"//div[#class='el-card advertisement card is-always-shadow'][{x}]")
target.click()
wait.until(EC.visibility_of_element_located((By.ID, "details")))
print(driver.current_url)
time.sleep(5)
driver.back()
time.sleep(5)
WebDriverWait(driver, 3).until(EC.title_contains("Just Sold"))
time.sleep(5)
x += 1
With my f-string xpath it's able to find the first div with that class and print the URL, but the moment it completes one iteration of the while loop, it fails to find the 2nd div with that class (so 2).
I've tried monitoring it with all the time.sleep() to see exactly where it was failing because I thought maybe it was running before the page loaded and therefore it couldn't be found, but I gave it ample time to finish loading every page and yet I can't find the issue.
This is the structure of the HTML on that page:
There is a class of that name (as well as "el-card__body" which I have also tried using) within each div, one for each item being displayed.
(This is what each div looks like)
Thank you for the help in advance!
(disclaimer: this is for a research paper, I do not plan on selling/benefiting off of the information being collected)
Storing each item in a list using find_elements_by_xpath, then iterating through them did the trick, as suggested by tdelaney.

Can't locate element on google chrome webpage

I'm trying to change the chrome profile name on this like : chrome://settings/manageProfile using Python and Selenium :
It's the empty textbox on the top left corner:
The issue is that I can't access to the element, I tried all the stuff below :
chromeProfilName = browser.find_element(By.XPATH, "//*[#id='profile-name']")
chromeProfilName = browser.find_element(By.ID, "profile-name")
chromeProfilName = browser.find_element(By.ID, "input")
I don't really understand how the HTML page is made, but when I examine the page, I found the textboxID = "input". However, the value is stored in a span, which ID is "profile-name".
I always have the same issue : "no such element: Unable to locate element". I don't have a deep knowledge about Selenium. I already looked for answers on internet but I found nothing.
Thanks !
To access the input element you need traverse through shadowroot element.Use the following querySelector to indentify the input tag.
driver.get("chrome://settings/manageProfile")
profileInput = driver.execute_script('return document.querySelector("settings-ui").shadowRoot.querySelector("settings-main").shadowRoot.querySelector("settings-basic-page").shadowRoot.querySelector("settings-people-page").shadowRoot.querySelector("settings-manage-profile").shadowRoot.querySelector("cr-input").shadowRoot.querySelector("#input")')
profileInput.click()
profileInput.clear()
profileInput.send_keys("user676767")
Browser snapshot:
The accepted answer works, I just want to add something :
For those who thought you've to search yourself the path with shadowroots, in reality you don't have to.
With Chrome for exemple (it's not possible on Firefox), when you target the element in the source code, copy the "JS path". Then you just have to paste the path in the execute_script function.

Selenium Page Source is Missing Elements

I have a basic Selenium script that makes use of the chromedriver binary. I'm trying to display a page with recaptcha on it and then hang until the answer has been completed and then store that in a variable for future use.
The roadblock I'm hitting is that I am unable to find the recaptcha element.
#!/bin/env python2.7
import os
from selenium import webdriver
driverBin=os.path.expanduser("~/Desktop/chromedriver")
driver=webdriver.Chrome(driverBin)
driver.implicitly_wait(5)
driver.get('http://patrickhlauke.github.io/recaptcha/')
Is there anything special needed to be able to see this element?
Also is there a way to grab the token after user solve without refreshing the page?
As it is now the input type of the recaptcha-token id is hidden. After solve a second recaptcha-token id is created. This is the value I wish to store in a variable. I was thinking of having a loop of checking length of found elements with that id. If greater than 1 parse. But I'm unsure whether the source updates per se.
UPDATE:
With more research it has to do with the nature of the element, particularly: with the tag: <input type="hidden". So I guess to rephrase my question, how does one extract the value of a hidden element.
The element you are looking for (the input) is in an iframe. You'll need switch to the iframe before you can locate the element and interact with it.
import os
from selenium import webdriver
driver=webdriver.Chrome()
try:
driver.implicitly_wait(5)
driver.get('http://patrickhlauke.github.io/recaptcha/')
# Find the iframe and switch to it
iframe_path = '//iframe[#title="recaptcha widget"]'
iframe = driver.find_element_by_xpath(iframe_path)
driver.switch_to.frame(iframe)
# Find the input element
input_elem = driver.find_element_by_id("recaptcha-token")
print("Found the input element: ", input_elem)
finally:
driver.quit()

Selenium + Python: StaleElementReferenceException (selecting by class)

I'm trying to write a simple Python script using Selenium, and while the loop runs once, I'm getting a StaleElementReferenceException.
Here's the script I'm running:
from selenium import webdriver
browser = webdriver.Firefox()
type(browser)
browser.get('http://digital2.library.ucla.edu/Search.do?keyWord=&selectedProjects=27&pager.offset=50&viewType=1&maxPageItems=1000')
links = browser.find_elements_by_class_name('searchTitle')
for link in links:
link.click()
print("clicked!")
browser.back()
I did try adding browser.refresh() to the loop, but it didn't seem to help.
I'm new to this, so please ... don't throw stuff at me, I guess.
It does not make sense to click through links inside of a loop. Once you click the first link, then you are no longer on the page where you got the links. Does that make sense?
Take the for loop out, and add something like link = links[0] to grab the first link, or something more specific to grab the specific link you want to click on.
If the intention is to click on every link on the page, then you can try something like the following:
links = browser.find_elements_by_class_name('searchTitle')
for i in range(len(links)):
links = browser.find_elements_by_class_name('searchTitle')
link = links[i] # specify the i'th link on the page
link.click()
print("clicked!")
browser.back()
EDIT: This might also be as simple as adding a pause after the browser.back(). You can do that with the following:
from time import sleep
...
sleep(5) # specify pause in seconds

Categories