Python selenium: running into StaleElementReferenceException - python

I am trying to scrape all job postings for the last 24 hours from Glassdoor and save them to a dictionary.
binary = FirefoxBinary('path_to_firebox_binary.exe')
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_binary=binary, capabilities=cap, executable_path=GeckoDriverManager().install())
base_url = 'https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn' \
'&typedKeyword=data+sc&sc.keyword=data+scientist&locT=C&locId=1154532&jobType= '
driver.get(url=base_url)
driver.implicitly_wait(20)
driver.maximize_window()
WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "div#filter_fromAge>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((
By.XPATH, "//div[#id='PrimaryDropdown']/ul//li//span[#class='label' and contains(., 'Last Day')]"))).click()
# find job listing elements on web page
listings = driver.find_elements_by_class_name("jl")
n_listings = len(listings)
results = {}
for index in range(n_listings):
driver.find_elements_by_class_name("jl")[index].click() # runs into error
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
results[index] = {'title': title, 'company': emp_name, 'description': description}
I keep running into the error message
selenium.common.exceptions.StaleElementReferenceException: Message:
The element reference of is stale; either the element is no longer attached to the
DOM, it is not in the current frame context, or the document has been
refreshed
for the first line inside my for loop. Even if the for loop runs for some number of times, it eventually leads to the exception showing up. I am new to selenium and web scraping, will appreciate any help.

Every time a new post is selected the clicked element is being modified, and therefor the DOM is being refreshed. The change is slow, certainly in comparison to the actions in the loop, so what you want to do is to slow it a little bit. Instead of using fixed sleep you can wait for the changes to occur
Every time you select a posting a new class selected is being added and the style attribute lose it's content. You should wait for this to happen, get the information, and click the next post
wait = WebDriverWait(driver, 20)
for index in range(n_listings - 1):
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.selected:not([style="border-bottom:0"])')))
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name('empInfo.newDetails')
emp = info.find_element_by_class_name('employerName')
if index < n_listings - 1:
driver.find_element_by_css_selector('.selected + .jl').click()

This error means the element you are trying to click on was not found, you have to first make sure the target element exists and then call click() or wrap it in a try/except block.
# ...
results = {}
for index in range(n_listings):
try:
driver.find_elements_by_class_name("jl")[index].click() # runs into error
except:
print('Listing not found, retrying in 1 seconds ...')
time.sleep(1)
continue
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
# ...

Related

XPath Check if ID exists within drop down. Class check

thanks in advance. Noobie to Python here and I am trying to automate value entries into a website via Selenium and the respective XPath values.
How it is supposed to function is that I send keys of the dynamic 'ID' into the input box and the ID will pop up and I select the ID. This works. Right now I am running into the issue where the ID does not exist and the tool ends up stalling out. I know I need an If function, then execute an elif if it does not exist but I am lost when it comes to these kind of statements with XPaths and need a little bit of guidance.
I have the class XPath of the pop up value stating the ID does not exist:
<li class="vv_list_no_items vv_item_indent listNoItems">No results match "1234567890123456"</li>
The confusing part is also having dynamic IDs where "1234567890123456" can be any ID.
Current code is below, sorry for the indenting as this was grabbed out of a larger set of scripts.
try:
wait = WebDriverWait(browser, 10)
# Inputs Legal Entity
elem = wait.until(EC.element_to_be_clickable((By.XPATH,
"//*[#id='di3Form']/div[2]/div[2]/div/div[1]/div[3]/div/div[2]/div/div[1]/input"))).send_keys(
LE)
elem = wait.until(
EC.element_to_be_clickable((By.XPATH, "//*[#id='veevaBasePage']/ul[3]/li/a"))).click()
LE = None
# Inputs WWID
elem = wait.until(EC.element_to_be_clickable((By.XPATH,
"//*[#id='di3Form']/div[2]/div[2]/div/div[1]/div[4]/div/div[2]/div/div[1]/input"))).send_keys(ID)
elem = wait.until(
EC.element_to_be_clickable((By.XPATH, "//*[#id='veevaBasePage']/ul[4]/li[2]/a/em"))).click()
# Inputs Country
elem = wait.until(EC.element_to_be_clickable((By.XPATH,
"//*[#id='di3Form']/div[2]/div[2]/div/div[1]/div[5]/div/div[2]/div/div[1]/input"))).send_keys(
Country)
elem = wait.until(
EC.element_to_be_clickable((By.XPATH, "//*[#id='veevaBasePage']/ul[5]/li/a"))).click()
# Save
elem = wait.until(EC.element_to_be_clickable((By.XPATH,
"//a[#class='docInfoSaveButton save vv_button vv_primary']/span[#class='vv_button_text vv_ellipsis' and text()='Save']")))
browser.execute_script("arguments[0].click();", elem)
wait = WebDriverWait(browser, 15)
# Click dropdown menu arrow
elem = wait.until(EC.element_to_be_clickable(
(By.XPATH, "//*[#id='di3Header']/div[3]/div/div[2]/div[1]/div/div[2]/div/div/button")))
browser.execute_script("arguments[0].click();", elem)
wait = WebDriverWait(browser, 100)
# Click "Publish"
elem = wait.until(EC.element_to_be_clickable((By.XPATH,
"/html/body/div[6]/div/ul/li")))
browser.execute_script("arguments[0].click();", elem)
#Confirm Publish
elem = wait.until(EC.element_to_be_clickable((By.XPATH,
"//a[#class='save vv_button vv_primary']/span[#class='vv_button_text' and text()='Yes']")))
browser.execute_script("arguments[0].click();", elem)
You can use xpath contains, with find_elements that returns a list and have a if condition that if it is >0, then No match found string would be present in UI.
try :
no_match = "No results match" + " " + '"' + WWID + '"'
if (len(driver.find_elements(By.XPATH, "//li[contains(text(),'{}')]".format(no_match)))) > 0:
print("Code has found No results match, String ")
# do what ever you wanna do here.
else:
print("There is not locator that contains, No results match")
except:
print("Something went wrong")
pass

Open link in new tab instead of clicking the element found by Class Name| Python

This is the link
https://www.unibet.eu/betting/sports/filter/football/matches
Using selenium driver, I access this link. This is what we have on the page
The actual task for me is to click on each of the match link. I found all those matches by
elems = driver.find_elements_by_class_name('eb700')
When i did this
for elem in elems:
elements
elem.click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
The first time it clicked, loaded new page, went to previous page and then gave the following error
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
I also tried getting HREF attribute from the elem, but it gave None, Is it possible to open the page in a new tab instead of clicking the elem?
You can retry to click on element once again since it is no longer present in DOM.
Code :
driver = webdriver.Chrome("C:\\Users\\**\\Inc\\Desktop\\Selenium+Python\\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.unibet.eu/betting/sports/filter/football/matches")
wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "OK"))).click()
sleep(2)
elements = driver.find_elements(By.XPATH, "//div[contains(#class,'_')]/div[#data-test-name='accordionLevel1']")
element_len = len(elements)
print(element_len)
counter = 0
while counter < element_len:
attempts = 0
while attempts < 2:
try:
ActionChains(driver).move_to_element(elements[counter]).click().perform()
except:
pass
attempts = attempts + 1
sleep(2)
# driver.execute_script("window.history.go(-1)") #may be get team name
#using //div[#data-test-name='teamName'] xpath
sleep(2)
# driver.refresh()
sleep(2)
counter = counter + 1
Since you move to next page, the elements no longer exists in DOM. So, you will get Stale Element exception.
What you can do is when comming back to same page, get all the links again (elems) and use while loop instead of for loop.
elems = driver.find_elements_by_class_name('eb700')
i=0
while i<len(elems):
elems[i].click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
elems = driver.find_elements_by_class_name('eb700')
i++
Other solution is to remain on same page and save all href attributes in a list and then use driver.navigate to open each match link.
matchLinks=[]
elems = driver.find_elements_by_class_name('eb700')
for elem in elems:
matchLinks.append(elem.get_attribute('href')
for match in matchLinks:
driver.get(match)
#do whatever you want to do on match page.

How to Wait for an element to be filled of text

I use Selenium + Python and I work on a Page that data is filled from JS and is Dynamic every 10 seconds but it's not important because I will run it once a week, I want to wait as long as the td with id='e5' get its value and be filled Or rather until the site is fully loaded, The site address is as follows :
Site Address
But i dont find Suitable Expected Conditions for this job :
driver = webdriver.Firefox()
driver.get('http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=2400322364771558')
WebDriverWait(driver, 10).until(EC.staleness_of((By.ID, 'e5')))
print(driver.find_element_by_id('e5').text)
driver.close()
I speak about this tag if you cant find it :
There is an Expected Condition for that:
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'e5')))
You don't need to wait for the element to be stale, you need it to be visible on the DOM.
Try this code to constantly check elements' text value
import time
pause = 1 # set interval between value checks in seconds
field = driver.find_element_by_id('e5')
value = field.text
while True:
if field.text != value:
value = field.text
print(value)
time.sleep(pause)
If you want to use WebdriverWait try
field = driver.find_element_by_id('e5')
value = field.text
while True:
WebDriverWait(driver, float('inf')).until(lambda driver: field.text != value)
value = field.text
print(value)

How do I deal with "Message: stale element reference: element is not attached to the page document" in Python Selenium

I'm writing a script to scrape product names from a website, filtered by brands. Some search results may contain more than one page, and this is where the problem comes in. I'm able to scrape the first page but when the script clicks on the next page the error message selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document shows. Below is my code:
def scrape():
resultList = []
currentPage = 1
while currentPage <= 2:
titleResults = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h4.mt-0')))
resultList.append(titleResults)
checkNextPage = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div/nav/ul/li/a[#aria-label='Next']")))
for cnp in checkNextPage:
nextPageNumber = int(cnp.get_attribute("data-page"))
currentPage += 1
driver.find_element_by_xpath("//div/nav/ul/li/a[#aria-label='Next']").click()
for result in resultList[0]:
print("Result: {}".format(result.text))
I think the error got triggered when .click() was called. I've done a lot of searching on the internet before resorting to posting this question here because either I don't understand the solutions from other articles/posts or they don't apply to my case.
Stale Element means an old element or no longer available element.
I think the error is caused by last line.
You should extract elements text before the elements become unavailable.
def scrape():
resultList = []
currentPage = 1
while currentPage <= 2:
titleResults = WebDriverWait(driver,
10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h4.mt-0')))
// Extract elements text
results_text = [titleResults[i].text for i in range(0, len(titleResults))]
resultList.extend(results_text)
checkNextPage = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div/nav/ul/li/a[#aria-label='Next']")))
for cnp in checkNextPage:
nextPageNumber = int(cnp.get_attribute("data-page"))
currentPage += 1
driver.find_element_by_xpath("//div/nav/ul/li/a[#aria-label='Next']").click()
print("Result: {}".format(resultList))

No data returned when using function to get data from site using selenium

I have a list with a few boxers.
I have created a function that will use selenium to open a particular web page, and then type in the name of a given boxer in my list on an autocomplete box. I then click on the first name that appears and then click on a view more (href). I continue clicking on view more until I get to the bottom of the page
I then read all the text with list of fights the boxer has fought and append to a dataframe column. I have tested this code with one boxer and know that it definitely works up until this point.
I then click on a download stats button for each fight in the list of fights I have collected and get text from the form that pops up. I then append all this information to another column.
This is the code I have written:
boxer_list = ['Deontay Wilder','Tyson Fury','Andy Ruiz']
def get_boxers(boxers):
page_link = 'http://beta.compuboxdata.com/fighter'
chromedriver = 'C:\\Users\\User\\Downloads\\chromedriver'
cdriver = webdriver.Chrome(chromedriver)
cdriver.maximize_window()
cdriver.get(page_link)
wait = WebDriverWait(cdriver,10)
wait.until(EC.visibility_of_element_located((By.ID,'s2id_autogen1'))).send_keys(str(boxers))
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'select2-result-label'))).click()
while True:
try:
element = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'view_more')))
element.click()
except TimeoutException:
break
fighters = cdriver.find_elements_by_xpath("//div[#class='row row-bottom-margin-5']/div[2]")
cols = ['fighters', 'stats']
fight_data = pd.DataFrame(columns = cols)
for fighter in fighters:
fight_data.append({'fighters':fighter.text},ignore_index=True)
cdriver.find_element_by_xpath("//a[contains(#onclick,'get_fight_report')]").click()
punches = cdriver.find_elements_by_xpath("//div[#class='modal-content']")
for punch in punches:
fight_data.append({'stats':punch.text},ignore_index=True)
cdriver.refresh()
return fight_data
boxing_dict = {boxer.replace(' ',''):pd.DataFrame([get_boxers(boxer)]) for boxer in boxer_list}
However when I run my function I get the error message:
ElementClickInterceptedException Traceback (most recent call last)
<ipython-input-204-1e27dbb96927> in <module>
35 cdriver.refresh()
36 return fight_data
---> 37 boxing_dict = {boxer.replace(' ',''):pd.DataFrame([get_boxers(boxer)]) for boxer in boxer_list}
..............
ElementClickInterceptedException: Message: element click intercepted: Element <a class="view_more" href="javascript:void(0);" onclick="_search('0')">...</a> is not clickable at point (274, 774). Other element would receive the click: <div class="col-xs-2 col-sm-2 col-md-2 col-lg-2">...</div>
(Session info: chrome=78.0.3904.108)
So from what I am gathering the code works up until this point
for fighter in fighters:
fight_data.append({'fighters':fighter.text},ignore_index=True)
I cannot, while looping through each of the fights in the list, click on the download button and get the punch stats from the pop up. This appears to be were the error message is coming from (I assume)
UPDATE:
I have tried updating my for loop to to wait before clicking on the view download button for each fight in my loop
for fighter in fighters:
fight_data.append({'fighters':fighter.text},ignore_index=True)
elem = wait.until(EC.element_to_be_clickable((By.XPATH,"//a[contains(#onclick,'get_fight_report')]")))
elem.click()
punches = cdriver.find_elements_by_xpath("//div[#class='modal-content']")
for punch in punches:
fight_data.append({'stats':punch.text},ignore_index=True)
Running this with a hardcoded key in send_keys() still returns an empty dataframe
wait.until(EC.visibility_of_element_located((By.ID,'s2id_autogen1'))).send_keys('Deontay Wilder')
Is it intentional 'Deontay Wilder' hardcoded in your function definition?
Edit
till now I can only say you to change you code with this line (added another row in class, without it fighters returned empty list
fighters = cdriver.find_elements_by_xpath("//div[#class='row row-bottom-margin-5']/div[2]")
I can print fighter.text values in console, but I can't append nothing to fight_data, neither an hardcoded string. I don't understand why...
Edit2
boxer_list = ['Deontay Wilder','Tyson Fury','Andy Ruiz']
def get_boxers(boxers):
page_link = 'http://beta.compuboxdata.com/fighter'
chromedriver = 'C:\\Users\\User\\Downloads\\chromedriver'
cdriver = webdriver.Chrome(chromedriver)
cdriver.maximize_window()
cdriver.get(page_link)
wait = WebDriverWait(cdriver,10)
wait.until(EC.visibility_of_element_located((By.ID,'s2id_autogen1'))).send_keys(str(boxers))
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'select2-result-label'))).click()
while True:
try:
element = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'view_more')))
element.click()
except :
break
fighters = cdriver.find_elements_by_xpath("//div[#class='row row-bottom-margin-5']/div[2]")
cols = ['fighters', 'stats']
fight_data = pd.DataFrame(columns = cols)
for fighter in fighters:
fight_data=fight_data.append({'fighters':fighter.text},ignore_index=True)
cdriver.find_element_by_xpath("//a[contains(#onclick,'get_fight_report')]").click()
punches = cdriver.find_elements_by_xpath("//div[#class='modal-content']")
for punch in punches:
fight_data=fight_data.append({'stats':punch.text},ignore_index=True)
cdriver.refresh()
return fight_data
I don't know how you want to collect stats. it's pretty a giant form and so punch.text return empty list.
The first part in the meanwhile is solved

Categories