Selenium error: Element is no longer attached to the DOM - python

I am trying to get a list of proxies from https://sslproxies.org/ using Selenium (headless via PhantomJS) and BeautifulSoup:
This is what I did so far:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get("https://sslproxies.org/")
while True:
try:
next_button = driver.find_element_by_xpath("//li[#class='paginate_button next'][#id='proxylisttable_next']")
except:
break
next_button.click()
soup = BeautifulSoup(next_button.get_attribute('innerHTML'),'html.parser')
But I get this error:
"errorMessage":"Element is no longer attached to the DOM"

You are defining next_button, then clicking said button, then trying to reference the next_button variable again. Your click has caused you to navigate to another page with a brand new DOM, and your definition for next_button no longer works. To avoid this you can simply redefine the variable or just always use the whole
driver.find_element_by_xpath("//li[#class='paginate_button next'][#id='proxylisttable_next']")

1 You can iterate through pages using for loop, but for this you will need to get the number of pages. Depending on site getting the number of pages method may be different. In you case it is ease.
You get the length of pages locators list+1, like this: len(driver.find_elements_by_xpath("//li[#class='paginate_button ']")).
2 Your locator was incorrect, so I changed it to: //li[#class='paginate_button next'][#id='proxylisttable_next']/a (added /a)
3 After finding the button you click it in finally.
SOLUTION
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
driver.set_window_size(1120, 550)
driver.get("https://sslproxies.org/")
wait = WebDriverWait(driver, 10)
length = len(driver.find_elements_by_xpath("//li[#class='paginate_button ']"))
print(f"List length is: {length}")
for j in range(1, length+1):
try:
print("Clicking Page " + str(j+1))
wait.until(
EC.visibility_of_element_located((By.CSS_SELECTOR, "section[id='list']")))
wait.until(EC.element_to_be_clickable((By.XPATH, "//li[#class='paginate_button next'][#id='proxylisttable_next']/a")))
finally:
next_button = driver.find_element_by_xpath(
"//li[#class='paginate_button next'][#id='proxylisttable_next']/a")
next_button.click()
P.S. I tested it on Chrome, but it should work in any browser, as I use stable locators and waits.
My output for debug:
List length is: 4
Clicking Page 2
Clicking Page 3
Clicking Page 4
Clicking Page 5

Related

Selenium scrolling randomly

So I am trying to scrape data from a table from several hundred pages on a website. Here is part of what I have so far:
driver.get("link")
driver.maximize_window()
window_before = driver.window_handles[0]
driver.switch_to.window(window_before)
wait = WebDriverWait(driver, 10)
driver.execute_script("window.scrollTo(0, 350)")
games = driver.find_elements(By.XPATH, '//*[#id="schedule"]/tbody/tr')
This code only works sometimes. If I run this chunk 10 times, only 5 times will the website actually scroll down. I tried using this:
for i in range(0, 2): driver.find_element(By.XPATH, '//*[#id="meta"]/div[1]/p[1]/a').send_keys(Keys.DOWN)
but the same issue arises. Sometimes that scrolls down the amount I need, other times it does nothing, and other times it scrolls the entire page.
This part of my code navigates to the first link I need to click and on the next page I need to scroll another page, where the same issue is present. This is all part of a loop that goes through several hundred pages to read html tables, so even if it works the first 50 times, I won't get all the data I need.
Edit: Directly after the above snippet I have this:
for idx, game in enumerate(games):
driver.find_element(By.XPATH, '/html/body/div[2]/div[6]/div[3]/div[2]/table/tbody/tr['+str(idx+1)+']/td[6]/a').click()
Which is where I get the "element is not clickable at point (X, Y)" error.
Am I doing something wrong here, or is there a work around to accomplish my goal?
Here is one way to access href attribute for every 'Box Score' link from that page (according to OP's clarification in comments):
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)
url = 'https://www.basketball-reference.com/leagues/NBA_2014_games-october.html'
browser.get(url)
# print(browser.page_source)
# browser.maximize_window()
try:
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[#class="qc-cmp2-summary-section"]'))).click()
print('clicked cookie parent')
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[#mode="primary"]'))).click()
print('accepted cookies')
except Exception as e:
print('no cookies')
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[#id="all_schedule"]'))).location_once_scrolled_into_view
table_with_score_links = wait.until(EC.presence_of_element_located((By.XPATH, '//table[#id="schedule"]')))
# print(table_with_score_links.get_attribute('outerHTML'))
links_from_table = [x.get_attribute('href') for x in table_with_score_links.find_elements(By.TAG_NAME, 'a') if x.text == 'Box Score']
print(links_from_table)
Result printed in terminal:
clicked cookie parent
accepted cookies
['https://www.basketball-reference.com/boxscores/201310290IND.html', 'https://www.basketball-reference.com/boxscores/201310290MIA.html', 'https://www.basketball-reference.com/boxscores/201310290LAL.html', 'https://www.basketball-reference.com/boxscores/201310300CLE.html', 'https://www.basketball-reference.com/boxscores/201310300TOR.html', 'https://www.basketball-reference.com/boxscores/201310300PHI.html', 'https://www.basketball-reference.com/boxscores/201310300DET.html', 'https://www.basketball-reference.com/boxscores/201310300NYK.html', 'https://www.basketball-reference.com/boxscores/201310300NOP.html', 'https://www.basketball-reference.com/boxscores/201310300MIN.html', 'https://www.basketball-reference.com/boxscores/201310300HOU.html', 'https://www.basketball-reference.com/boxscores/201310300SAS.html', 'https://www.basketball-reference.com/boxscores/201310300DAL.html', 'https://www.basketball-reference.com/boxscores/201310300UTA.html', 'https://www.basketball-reference.com/boxscores/201310300PHO.html', 'https://www.basketball-reference.com/boxscores/201310300SAC.html', 'https://www.basketball-reference.com/boxscores/201310300GSW.html', 'https://www.basketball-reference.com/boxscores/201310310CHI.html', 'https://www.basketball-reference.com/boxscores/201310310LAC.html']
I tried to make variable names as descriptive as possible, and also left some commented out lines of code, to help with the thought process - build up to reach the end goal.
You can now go through those links one by one, etc.
Selenium documentation can be found here: https://www.selenium.dev/documentation/

Selenium is returning empty text for elements that definitely have text

I'm practicing trying to scrape my university's course catalog. I have a few lines in Python that open the url in Chrome and clicks the search button to bring up the course catalog. When I go to extract the texting using find_elements_by_xpath(), it returns blank. When I use the dev tools on Chrome, there definitely is text there.
from selenium import webdriver
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]')
print(course)
I'm trying to extract the text from the element 'OSU_CAT_SRCH_OSR_CRSE_HEADER'. I don't understand why it's not returning the text values especially when I can see that it contains text with dev tools.
You are not using text that is the reason you are not getting the text.
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]').text
Try above changes in last second line
Below is the full code after the changes
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
# wait 10 seconds
course = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]'))
).text
print(course)

Dynamically generated element -NoSuchElementException: Message: no such element: Unable to locate element?

I am having trouble selecting a load more button on a Linkedin page. I receive this error in finding the xpath: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element
I suspect that the issue is that the button is not visible on the page at that time. So I have tried actions.move_to_element. However, the page scrolls just below the element, so that the element is no longer visible, and the same error subsequently occurs.
I have also tried move_to_element_with_offset, but this hasn't changed where the page scrolls to.
How can I scroll to the right location on the page such that I can successfully select the element?
My relevant code:
import parameters
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('C:\\Users\\Root\\Downloads\\chromedriver.exe')
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
sleep(0.5)
username = driver.find_element_by_name('session_key')
username.send_keys(parameters.linkedin_username)
sleep(0.5)
password = driver.find_element_by_name('session_password')
password.send_keys(parameters.linkedin_password)
sleep(0.5)
sign_in_button = driver.find_element_by_xpath('//button[#class="btn__primary--large from__button--floating"]')
sign_in_button.click()
driver.get('https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk')
loadmore_skills=driver.find_element_by_xpath('//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]')
actions = ActionChains(driver)
actions.move_to_element(loadmore_skills).perform()
#actions.move_to_element_with_offset(loadmore_skills, 0, 0).perform()
loadmore_skills.click()
After playing around with it, I seem to have figured out where the problem is stemming from. The error
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]"}
(Session info: chrome=81.0.4044.113)
always correctly states the problem its encountering and as such it's not able to find the element. The possible causes of this include:
Element not present at the time of execution
Dynamically generated
content Conflicting names
In your case, it was the second point. As the content that is displayed is loaded dynamically as you scroll down. So When it first loads your profile the skills sections aren't actually present in the DOM. So to solve this, you simply have to scroll to the section so that it gets applied in the DOM.
This line is the trick here. It will position it to the correct panel and thus loading and applying the data to the DOM.
driver.execute_script("window.scrollTo(0, 1800)")
Here's my code (Please change it as necessary)
from time import sleep
# import parameters
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('../chromedriver.exe')
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
sleep(0.5)
username = driver.find_element_by_name('session_key')
username.send_keys('')
sleep(0.5)
password = driver.find_element_by_name('session_password')
password.send_keys('')
sleep(0.5)
sign_in_button = driver.find_element_by_xpath('//button[#class="btn__primary--large from__button--floating"]')
sign_in_button.click()
driver.get('https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk')
sleep(3)
# driver.execute_script("window.scrollTo(0, 1800)")
sleep(3)
loadmore_skills=driver.find_element_by_xpath('//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]')
actions = ActionChains(driver)
actions.move_to_element(loadmore_skills).perform()
#actions.move_to_element_with_offset(loadmore_skills, 0, 0).perform()
loadmore_skills.click()
Output
Update
In concerns to your newer problem, you need to implement a continuous scroll method that would enable you to dynamically update the skills section. This requires a lot of change and should ideally be asked as a another question.
I have also found a simple solution by setting the scroll to the correct threshold. For y=3200 seems to work fine for all the profiles I've checked including yours, mine and few others.
driver.execute_script("window.scrollTo(0, 3200)")
If the button is not visible on the page at the time of loading then use the until method to delay the execution
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Button is rdy!"
except TimeoutException:
print "Loading took too much time!"
Example is taken from here
To get the exact location of the element, you can use the following method to do so.
element = driver.find_element_by_id('some_id')
element.location_once_scrolled_into_view
This actually intends to return you the coordinates (x, y) of the element on-page, but also scroll down right to target element. You can then use the coordinates to make a click on the button. You can read more on that here.
You are getting NoSuchElementException error when the locators (i.e. id / xpath/name/class_name/css selectors etc) we mentioned in the selenium program code is unable to find the web element on the web page.
How to resolve NoSuchElementException:
Apply WebDriverWait : allow webdriver to wait for a specific time
Try catch block
so before performing action on webelement you need to take web element into view, I have removed unwated code and also avoided use of hardcoded waits as its not good practice to deal with synchronization issue. Also while clicking on show more button you have to scroll down otherwise it will not work.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path="path of chromedriver.exe")
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
driver.maximize_window()
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.NAME, "session_key"))).send_keys("email id")
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.NAME, "session_password"))).send_keys("password ")
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//button[#class='btn__primary--large from__button--floating']"))).click()
driver.get("https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk")
driver.maximize_window()
driver.execute_script("scroll(0, 250);")
buttonClick = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//span[text()='Show more']")))
ActionChains(driver).move_to_element(buttonClick).click().perform()
Output:

Scroll with Keys.PAGE_DOWN in Selenium Python

Hello Every one can any one help me in scrolling https://www.grainger.com/category/black-pipe-fittings/pipe-fittings/pipe-tubing-and-fittings/plumbing/ecatalog/N-qu1?searchRedirect=products
i want to scroll this using
actions = ActionChains(browser)
actions.send_keys(Keys.PAGE_DOWN)
actions.perform()
till it reaches the bottom of the scroll where it will find an element "Load More"
loadMoreButton = browser.find_element_by_css_selector(
".btn.list-view__load-more.list-view__load-more--js")
loadMoreButton.click()
and then ponce clicked the load more button it has to again perform the scroll action and then again the loadmore action until the load more button is not available.
I have to use this page down action as the element does not load until the page is scrolled till the element if anyone could suggest some solution will be of great help
This has worked for me with zero issues...
from selenium.webdriver.common.keys import Keys
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)
To scroll the page https://www.grainger.com/category/black-pipe-fittings/pipe-fittings/pipe-tubing-and-fittings/plumbing/ecatalog/N-qu1?searchRedirect=products till it reaches the bottom of the page where it will find an element with text as View More and then click the element until the element is not available you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import TimeoutException
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
browser=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
browser.get("https://www.grainger.com/category/black-pipe-fittings/pipe-fittings/pipe-tubing-and-fittings/plumbing/ecatalog/N-qu1?searchRedirect=products")
while True:
try:
browser.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='btn list-view__load-more list-view__load-more--js' and normalize-space()='View More']"))))
browser.execute_script("arguments[0].click();", WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='btn list-view__load-more list-view__load-more--js' and normalize-space()='View More']"))))
print("View More button clicked")
except (TimeoutException, StaleElementReferenceException) as e:
print("No more View More buttons")
break
browser.quit()
Console Output:
View More button clicked
View More button clicked
No more View More buttons
#PedroLobito I am trying to retrieve the product links can you help me
in this
No need for selenium in this case, just sniff the xhr requests via developer tools and go straight to the gold (json).
The url structure for products is as follows:
https://www.x.com/product/anything-Item#
Just add the Item # value in the json object at the end of the url, something like:
https://www.x.com/product/anything-5P540
https://www.x.com/product/anything-5P541
...
py3 example (for py2, just change the format syntax):
import json
import requests
main_cat = "WP7115916"
sub_cat = "4836"
x = requests.get(f"https://www.x.com/product/tableview/GRAINGER-APPROVED-Square-Head-Plugs-{main_cat}/_/N-qu1?searchRedirect=products&breadcrumbCatId={sub_cat}&s_pp=false").json()
for p in x['records']:
for childs in p['children']:
for item in json.loads(childs['collapseValues']):
url = f"https://www.x.com/product/lol-{item['sku']}"
print(url)
https://www.x.com/product/lol-5P540
https://www.x.com/product/lol-5P541
https://www.x.com/product/lol-5P542
https://www.x.com/product/lol-5P543
https://www.x.com/product/lol-5P544
https://www.x.com/product/lol-5P545
https://www.x.com/product/lol-5P546
https://www.x.com/product/lol-5P547
https://www.x.com/product/lol-5P548
...
One of the best method for smooth scrolling...
html = driver.find_element(By.XPATH,'//body')
total_scroled = 0
page_height = driver.execute_script("return document.body.scrollHeight")
while total_scroled < page_height:
html.send_keys(Keys.PAGE_DOWN)
total_scroled += 400
time.sleep(.5)

Selenium Python: How to wait for a page to load after a click?

I want to grab the page source of the page after I make a click. And then go back using browser.back() function. But Selenium doesn't let the page fully load after the click and the content which is generated by JavaScript isn't being included in the page source of that page.
element[i].click()
#Need to wait here until the content is fully generated by JS.
#And then grab the page source.
scoreCardHTML = browser.page_source
browser.back()
As Alan mentioned - you can wait for some element to be loaded. Below is an example code
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "element_id")))
you can also use seleniums staleness_of
from selenium.webdriver.support.expected_conditions import staleness_of
def wait_for_page_load(browser, timeout=30):
old_page = browser.find_element_by_tag_name('html')
yield
WebDriverWait(browser, timeout).until(
staleness_of(old_page)
)
You can do it using this method of a loop of try and wait, an easy to implement method
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
Button=''
while not Button:
try:
Button=browser.find_element_by_name('NAME OF ELEMENT')
Button.click()
except:continue
Assuming "pass" is an element in the current page and won't be present at the target page.
I mostly use Id of the link I am going to click on. because it is rarely present at the target page.
while True:
try:
browser.find_element_by_id("pass")
except:
break

Categories