The element is not attached - Selenium in Python - python

I am trying scraping data from a number of pages on a website by using selenium in python. The syntax run and scrape data successfully on the first page but, after the second page, it can't find the click button and stop scraping. I check the HTML codes of the webpage, but the element on the second page is as same as the one on the first page. I found this question related to the same issue. I think that the problem is caused by that the reference to the button is lost after the DOM is changed, but I still can't fix the issue properly. I would appreciate any suggestions or solutions. The syntax and results are included below:
browser = webdriver.Chrome(r"C:\Users\...\chromedriver.exe")
browser.get('https://fortune.com/global500/2019/walmart')
table = browser.find_element_by_css_selector('tbody')
data =[]
#Use For Loop for Index
i = 1
while True:
if i > 5:
break
try:
print("Scraping Page no. " + str(i))
i = i + 1
# Select rows in the table
for row in table.find_elements_by_css_selector('tr'):
cols = data.append([cell.text for cell in row.find_elements_by_css_selector('td')])
try:
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH,'//span[#class="singlePagination__icon--2KbZn"]')))
time.sleep(10)
finally:
browser.find_element_by_xpath('//span[#class="singlePagination__icon--2KbZn"]').click()
except Exception as e:
print(e)
break
data1 = pd.DataFrame(data, columns=['Labels','Value'])
print(data1)
browser.close()
output:
Scraping Page no. 1
Scraping Page no. 2
Message: stale element reference: element is not attached to the page document
(Session info: chrome=....)
Labels Value
0 (...) (...)
1 (...) (...)

move table = browser.find_element_by_css_selector('tbody') line into your while loop.So that you will get the latest reference to the table element as part of each loop and then you should not see any stale element issue.
while True:
table = browser.find_element_by_css_selector('tbody')
if i > 5:

Related

Crawler stucks at last page

I have some crawler code with chromedriver and selenium in python which goes through different pages.
However it does not reach the last page.
For example the maximum number of pages = 9, number of rows in table = 10. It then will continue scraping page 8 and start over again with page 8 again infinite.
My looping code is looking like this:
def extract(page):
while 1:
pc = 1
print ("Extracting Page: " + str(page))
while pc <= 10:
colpageprod(pc)
browser.back()
waitSmall()
pc+=1
try:
np = browser.find_element_by_xpath('//li[#class="next"]/a').click()
waitSmall()
except:
pass
page +=1
if page < 1:
break
try:
extract(1)
Problem is that on the website the second last page has no "next" button.
How could we work with this in the code?

Open link in new tab instead of clicking the element found by Class Name| Python

This is the link
https://www.unibet.eu/betting/sports/filter/football/matches
Using selenium driver, I access this link. This is what we have on the page
The actual task for me is to click on each of the match link. I found all those matches by
elems = driver.find_elements_by_class_name('eb700')
When i did this
for elem in elems:
elements
elem.click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
The first time it clicked, loaded new page, went to previous page and then gave the following error
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
I also tried getting HREF attribute from the elem, but it gave None, Is it possible to open the page in a new tab instead of clicking the elem?
You can retry to click on element once again since it is no longer present in DOM.
Code :
driver = webdriver.Chrome("C:\\Users\\**\\Inc\\Desktop\\Selenium+Python\\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.unibet.eu/betting/sports/filter/football/matches")
wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "OK"))).click()
sleep(2)
elements = driver.find_elements(By.XPATH, "//div[contains(#class,'_')]/div[#data-test-name='accordionLevel1']")
element_len = len(elements)
print(element_len)
counter = 0
while counter < element_len:
attempts = 0
while attempts < 2:
try:
ActionChains(driver).move_to_element(elements[counter]).click().perform()
except:
pass
attempts = attempts + 1
sleep(2)
# driver.execute_script("window.history.go(-1)") #may be get team name
#using //div[#data-test-name='teamName'] xpath
sleep(2)
# driver.refresh()
sleep(2)
counter = counter + 1
Since you move to next page, the elements no longer exists in DOM. So, you will get Stale Element exception.
What you can do is when comming back to same page, get all the links again (elems) and use while loop instead of for loop.
elems = driver.find_elements_by_class_name('eb700')
i=0
while i<len(elems):
elems[i].click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
elems = driver.find_elements_by_class_name('eb700')
i++
Other solution is to remain on same page and save all href attributes in a list and then use driver.navigate to open each match link.
matchLinks=[]
elems = driver.find_elements_by_class_name('eb700')
for elem in elems:
matchLinks.append(elem.get_attribute('href')
for match in matchLinks:
driver.get(match)
#do whatever you want to do on match page.

How do I deal with "Message: stale element reference: element is not attached to the page document" in Python Selenium

I'm writing a script to scrape product names from a website, filtered by brands. Some search results may contain more than one page, and this is where the problem comes in. I'm able to scrape the first page but when the script clicks on the next page the error message selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document shows. Below is my code:
def scrape():
resultList = []
currentPage = 1
while currentPage <= 2:
titleResults = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h4.mt-0')))
resultList.append(titleResults)
checkNextPage = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div/nav/ul/li/a[#aria-label='Next']")))
for cnp in checkNextPage:
nextPageNumber = int(cnp.get_attribute("data-page"))
currentPage += 1
driver.find_element_by_xpath("//div/nav/ul/li/a[#aria-label='Next']").click()
for result in resultList[0]:
print("Result: {}".format(result.text))
I think the error got triggered when .click() was called. I've done a lot of searching on the internet before resorting to posting this question here because either I don't understand the solutions from other articles/posts or they don't apply to my case.
Stale Element means an old element or no longer available element.
I think the error is caused by last line.
You should extract elements text before the elements become unavailable.
def scrape():
resultList = []
currentPage = 1
while currentPage <= 2:
titleResults = WebDriverWait(driver,
10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h4.mt-0')))
// Extract elements text
results_text = [titleResults[i].text for i in range(0, len(titleResults))]
resultList.extend(results_text)
checkNextPage = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div/nav/ul/li/a[#aria-label='Next']")))
for cnp in checkNextPage:
nextPageNumber = int(cnp.get_attribute("data-page"))
currentPage += 1
driver.find_element_by_xpath("//div/nav/ul/li/a[#aria-label='Next']").click()
print("Result: {}".format(resultList))

Python selenium: running into StaleElementReferenceException

I am trying to scrape all job postings for the last 24 hours from Glassdoor and save them to a dictionary.
binary = FirefoxBinary('path_to_firebox_binary.exe')
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_binary=binary, capabilities=cap, executable_path=GeckoDriverManager().install())
base_url = 'https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn' \
'&typedKeyword=data+sc&sc.keyword=data+scientist&locT=C&locId=1154532&jobType= '
driver.get(url=base_url)
driver.implicitly_wait(20)
driver.maximize_window()
WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "div#filter_fromAge>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((
By.XPATH, "//div[#id='PrimaryDropdown']/ul//li//span[#class='label' and contains(., 'Last Day')]"))).click()
# find job listing elements on web page
listings = driver.find_elements_by_class_name("jl")
n_listings = len(listings)
results = {}
for index in range(n_listings):
driver.find_elements_by_class_name("jl")[index].click() # runs into error
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
results[index] = {'title': title, 'company': emp_name, 'description': description}
I keep running into the error message
selenium.common.exceptions.StaleElementReferenceException: Message:
The element reference of is stale; either the element is no longer attached to the
DOM, it is not in the current frame context, or the document has been
refreshed
for the first line inside my for loop. Even if the for loop runs for some number of times, it eventually leads to the exception showing up. I am new to selenium and web scraping, will appreciate any help.
Every time a new post is selected the clicked element is being modified, and therefor the DOM is being refreshed. The change is slow, certainly in comparison to the actions in the loop, so what you want to do is to slow it a little bit. Instead of using fixed sleep you can wait for the changes to occur
Every time you select a posting a new class selected is being added and the style attribute lose it's content. You should wait for this to happen, get the information, and click the next post
wait = WebDriverWait(driver, 20)
for index in range(n_listings - 1):
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.selected:not([style="border-bottom:0"])')))
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name('empInfo.newDetails')
emp = info.find_element_by_class_name('employerName')
if index < n_listings - 1:
driver.find_element_by_css_selector('.selected + .jl').click()
This error means the element you are trying to click on was not found, you have to first make sure the target element exists and then call click() or wrap it in a try/except block.
# ...
results = {}
for index in range(n_listings):
try:
driver.find_elements_by_class_name("jl")[index].click() # runs into error
except:
print('Listing not found, retrying in 1 seconds ...')
time.sleep(1)
continue
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
# ...

(python)stale element reference: element is not attached to the page document

I am scraping amazon products but in the first, I want to click on each category, the code work just with the first category in the loop and get this error, I searched about that and found many of answers but they didn't work inside the loop and all of them work with xpath(one element not elements)
first click (see_more) work, the problem with a click in loop
ERROR:
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=80.0.3987.149)
Here is the code.
from selenium.common.exceptions import ElementClickInterceptedException
from csv import writer
from selenium.webdriver import ActionChains
driver = webdriver.Chrome(executable_path='C:\\Users\\Compu City\\Desktop\\chromedriver.exe')
driver.get('https://www.amazon.com/international-sales-offers/b/?ie=UTF8&node=15529609011&ref_=nav_navm_intl_deal_btn')
time.sleep(10)
res = driver.execute_script("return document.documentElement.outerHTML", 'window.scrollBy(0,2000)')
soup = BeautifulSoup(res, 'lxml')
cat=[]
filter_con = driver.find_element_by_id('widgetFilters') # main container of products
cats=driver.find_elements_by_css_selector('.a-expander-container .a-checkbox label .a-label')
see_more =driver.find_element_by_css_selector('#widgetFilters > div:nth-child(1) > div.a-row.a-expander-container.a-expander-inline-container > a > span')
ActionChains(driver).move_to_element(filter_con).click(see_more).perform()
cat= 0
while(cat < len(cats)):
print(cat)
print(cats[cat].text)
action = ActionChains(driver).move_to_element(filter_con).click(cats[cat]).perform()
cat+=1
The moment you click on the cat element the references in the cats will be refreshed meaning selenium will get a new set of references to each elements. And as you are still pointing to the older references you are getting Stale Element Exception. Update your code as below.
Option 1: Fixing in existing code
while(cat < len(cats)):
currentCat = driver.find_elements_by_css_selector('.a-expander-container .a-checkbox label .a-label')[cat]
print(cat)
print(currentCat.text)
action = ActionChains(driver).move_to_element(filter_con).click(currentCat).perform()
cat+=1
Option 2: Using for loop (without action chain)
for catNumber in range(len(cats)):
cat = driver.find_elements_by_css_selector('.a-expander-container .a-checkbox label .a-label')[catNumber+1]
print(catNumber+1)
# scroll to the elemen
cat.location_once_scrolled_into_view
# click
cat.click()

Categories