How to Wait for an element to be filled of text - python

I use Selenium + Python and I work on a Page that data is filled from JS and is Dynamic every 10 seconds but it's not important because I will run it once a week, I want to wait as long as the td with id='e5' get its value and be filled Or rather until the site is fully loaded, The site address is as follows :
Site Address
But i dont find Suitable Expected Conditions for this job :
driver = webdriver.Firefox()
driver.get('http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=2400322364771558')
WebDriverWait(driver, 10).until(EC.staleness_of((By.ID, 'e5')))
print(driver.find_element_by_id('e5').text)
driver.close()
I speak about this tag if you cant find it :

There is an Expected Condition for that:
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'e5')))
You don't need to wait for the element to be stale, you need it to be visible on the DOM.

Try this code to constantly check elements' text value
import time
pause = 1 # set interval between value checks in seconds
field = driver.find_element_by_id('e5')
value = field.text
while True:
if field.text != value:
value = field.text
print(value)
time.sleep(pause)
If you want to use WebdriverWait try
field = driver.find_element_by_id('e5')
value = field.text
while True:
WebDriverWait(driver, float('inf')).until(lambda driver: field.text != value)
value = field.text
print(value)

Related

How to not wait for a page to fully load, selenium python [duplicate]

This question already has answers here:
Don't wait for a page to load using Selenium in Python
(3 answers)
Closed 2 months ago.
I have this code that should take around 360 hours to fully complete and it's all because of the slow servers of the website I'm trying to scrape, But when I look at the website and the python console at the same time I realize the elements I'm trying to use has already been loaded and the selenium is waiting for the useless ads and another thing I don't care about to load. So I was wondering if there is any way to start scraping as soon as the elements needed are loaded.
Another way of doing this would be to just do the scraping even if the page is not loaded and then using the time.sleep I can time it by hand. Though this question has already been asked and answered in stack overflow so if this is the only way of doing it you can let me know in the comments otherwise better way would be to wait only for the elements needed to be scrapped which would make it way easier.
I don't think my code could help you answer my question but ill put it here in case.
code:
#C:\Users\keibo\PycharmProjects\emergency ahanonline project
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium import webdriver
import pandas as pd
from webdriver_manager.chrome import ChromeDriverManager
import time
t = time.localtime()
current_time = time.strftime("%H:%M:%S", t)
print(f'[{current_time}] Started.')
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
#options.add_argument("--headless")
output=f'\nState, City, Group, Sub_Group, Address, Website, Description, Views'
browser = webdriver.Chrome(options=options,service=Service(ChromeDriverManager().install()))
def tir():
global output
browser.get(
'https://senf.ir/ListCompany/75483/%D8%A2%D9%87%D9%86-%D8%A2%D9%84%D8%A7%D8%AA-%D9%88-%D8%B6%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA')
browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_11").click()
pages = (browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_9").text)
print(f'There are {pages} pages of 20 names which means there is {pages*20} people to save.')
for page in range(pages-1):
for person in range(19):
browser.get(
'https://senf.ir/ListCompany/75483/%D8%A2%D9%87%D9%86-%D8%A2%D9%84%D8%A7%D8%AA-%D9%88-%D8%B6%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA')
browser.find_element(By.ID, f"ContentPlaceHolder2_grdProduct_HpCompany_{person}").click()
try:
state = (browser.find_element(By.XPATH,
'(.//span[#id = "ContentPlaceHolder2_rpParent_lblheaderCheild_0"])').text)
if state == '' or state == ' ':
state = None
except:
state = None
try:
city = (browser.find_element(By.XPATH,
'(.//span[#id = "ContentPlaceHolder2_rpParent_lblheaderCheild_1"])').text)
if city == '' or city == ' ':
city = None
except:
city = None
try:
group = (browser.find_element(By.XPATH,
'(.//span[#id = "ContentPlaceHolder2_rpParent_lblheaderCheild_2"])').text)
if group == '' or group == ' ':
group = None
except:
group = None
try:
sub_group = (browser.find_element(By.XPATH,
'(.//span[#id = "ContentPlaceHolder2_rpParent_lblheaderCheild_3"])').text)
if sub_group == '' or sub_group == ' ':
sub_group = None
except:
sub_group = None
try:
Address = (browser.find_element(By.XPATH, '(.//span[#id = "ContentPlaceHolder2_txtAddress"])').text)
if Address == '' or Address == ' ':
Address = None
except:
Address = None
try:
ceo = (browser.find_element(By.XPATH, '(.//span[#id = "ContentPlaceHolder2_LblManager"])').text)
if ceo == '' or ceo == ' ':
ceo = None
except:
ceo = None
# print(browser.find_element(By.XPATH, '(.//span[#id = "ContentPlaceHolder2_ImgEmail"])').text)
try:
website = str(browser.find_element(By.XPATH, '(.//a[#id = "ContentPlaceHolder2_hfWebsait"])').text)
if website == '' or website == ' ':
website = None
except:
website = None
try:
Description = (browser.find_element(By.XPATH, '(.//span[#id = "ContentPlaceHolder2_lblDesc"])').text)
if Description == '' or Description == ' ':
Description = None
except:
Description = None
try:
views = (browser.find_element(By.XPATH, '(.//span[#id = "ContentPlaceHolder2_lblVisit"])').text)
if views == '' or views == ' ':
views = None
except:
views = None
output += f'\n{views}, {Description}, {website}, {Address}, {sub_group}, {group}, {city}, {state}'
print(output)
print('--------------------------------------------')
browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_12").click()
tir()
print("End")
with open('Program Files\CSV pre built.txt') as f1:
file1 = open("Program Files\CSV pre built.txt", "w")
file1.write(output)
file1.close()
read_file1 = pd.read_csv('Program Files\CSV pre built.txt')
read_file1.to_csv('Output.csv', index=False)
try:
pass
except Exception as e:
browser.close()
print('something went wrong ):')
sees=input('Press enter to leave or press 1 and than enter to see error: ')
if sees=='1':
input(e)
If you want to prioritize locating specific elements over the whole page, try using an explicit wait. If you want to wait for the whole webpage, use an implicit wait.
Explicit Wait
Where element is a desired search param & driver is your WebDriver:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
WebDriverWait(driver, timeout=5).until(lambda d: d.find_element(By.ID, 'element'))
Rather than wait for the entire webpage to load, this particular function waits for a given amount of time to find an element. For this scenario, the function waits for a maximum of 5 seconds to find the element with an ID of "element." You can assign a variable to this function to store the element it finds (if it is valid and discovered).
--
Implicit Wait
You mentioned using the sleep function time.sleep() to wait for the webpage to load. Selenium offers a method called Implicit Waiting for this. Rather than manually triggering a halt in the program, Selenium allows the driver to wait up to an imposed amount of time. The code is shown below, where driver is your WebDriver:
driver.implicitly_wait(5)
It is generally advised not to use time.sleep() as it "defeats the purpose of Automation." A more detailed explanation regarding implicit/static waits can be found in this post.
--
Explicit Wait Example
For a more direct answer to your question, we can apply an explicit wait to line 21 of your code snippet.
browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_11").click()
Explicit waits can store and be applied to element searches. While variable declaration for element searches is not necessary, I highly recommend doing so for elements that require more than one applied action. The above code will be replaced like so:
pagelink = WebDriverWait(browser, timeout=10).until(lambda b: b.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_11"))
pagelink.click()
This explicit wait function allows a grace period of up to 10 seconds to find the element's ID. For a more versatile use, the element is stored in the variable pagelink. Selenium then performs the click action on the element.
--
Implicit Wait Example
Rather than apply waits to every single element, implicit waits are used for every page that loads. Let's apply this between lines 27 and 28 of your code where it declares:
browser.get(
'https://senf.ir/ListCompany/75483/...')
browser.find_element(By.ID, f"ContentPlaceHolder2...").click()
Directly after the get function, we can use an implicit wait for Selenium to wait for when elements load:
browser.get("https://senf.ir/ListCompany/...")
browser.implicitly_wait(10)
browser.find_element(By.ID, f"ContentPlaceHolder2...").click()
The Selenium driver waits for up to 10 seconds for all elements on the page to load. If the driver exceeds these 10 seconds, the driver immediately runs the next statement (in this case the find_element function).
--
Documentation for Explicit and Implicit Waits can be found here.

How do I access the 2nd element with the same xpath in python in selenium

What I mean is that the website I'm using has 2 dropmenus named province with the exact same id, so how do I tell python which dropmenu in particular I wanna select. Of course this is assuming that the issue is that python always picks the first id it sees
from selenium import webdriver
from selenium.webdriver.support.ui import Select
# There are two dropmenu with the same xpath. first time it works fine
# 2nd time it throws an error about element not interactable
Prov = Select(web.find_element_by_xpath('//*[#id="province"]'))
Prov.select_by_index(2)
def Start():
# once opened it will fill in the confirm your age
Day = Select(web.find_element_by_xpath('//*[#id="bday_day"]'))
Day.select_by_index(2)
Month = Select(web.find_element_by_xpath('//*[#id="bday_month"]'))
Month.select_by_index(4)
Month = Select(web.find_element_by_xpath('//*[#id="bday_year"]'))
Month.select_by_index(24)
Prov = Select(web.find_element_by_xpath('//*[#id="province"]'))
Prov.select_by_index(5)
Button = web.find_element_by_xpath('//*[#id="popup-subscribe"]/button')
Button.click()
# have to go through select your birthday
Start()
# 2 seconds is enough for the website to load
time.sleep(2)
# this throws and error.
Prov = Select(web.find_element_by_xpath('//*[#id="province"]'))
Prov.select_by_index(5)
Selenium has functions
find_element_by_... without s in word element to get only first element
find_elements_by_... with s in word elements to get all elements
Selenium doc: 4. Locating Elements¶
So you can get all elements as list (even if there is only one element in HTML)
(if there is no elements then you get empty list)
elements = web.find_elements_by_xpath('//*[#id="province"]')
and later slice it
first = elements[0]
second = elements[1]
last = elements[-1]
list_first_and_second = elements[:1]
EDIT:
You can also try to slice directly in xpath like
(it starts counting at one, not zero)
'//*[#id="province"][2]'
or maybe
'(//*[#id="province"])[2]'
but I never used it to confirm if it will work.
BTW:
All ID should have unique names - you shouldn't duplicate IDs.
If you check documentation 4. Locating Elements¶ then you see that there is find_element_by_id without char s in word element - to get first and the only element with some ID - but there is no find_elements_by_id with char s in word elements - to get more then one element with some ID.
EDIT:
Minimal working code with example HTML in code
from selenium import webdriver
from selenium.webdriver.support.ui import Select
html = '''
<select id="province">
<option value="value_A">A</options>
<option value="value_B">B</options>
</select>
<select id="province">
<option value="value_1">1</options>
<option value="value_2">2</options>
</select>
'''
driver = webdriver.Firefox()
driver.get("data:text/html;charset=utf-8," + html)
all_elements = driver.find_elements_by_xpath('//*[#id="province"]')
first = all_elements[0]
second = all_elements[1]
prov1 = Select(first)
prov2 = Select(second)
print('--- first ---')
for item in prov1.options:
print('option:', item.text, item.get_attribute('value'))
for item in prov1.all_selected_options:
print('selected:', item.text, item.get_attribute('value'))
print('--- second ---')
for item in prov2.options:
print('option:', item.text, item.get_attribute('value'))
for item in prov2.all_selected_options:
print('selected:', item.text, item.get_attribute('value'))
EDIT:
There are two province.
When you use find_element in Start then you get first province in popup - and you can fill it. When you click button then it closes this popup but it doesn't remove first province from HTML - it only hide it.
Later when you use find_element you get again first province in hidden popup - and this time it is not visible and it can't use it - and this gives error. You have to use second province like in this example.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
def Start():
# once opened it will fill in the confirm your age
Day = Select(web.find_element_by_xpath('//*[#id="bday_day"]'))
Day.select_by_index(2)
Month = Select(web.find_element_by_xpath('//*[#id="bday_month"]'))
Month.select_by_index(4)
Month = Select(web.find_element_by_xpath('//*[#id="bday_year"]'))
Month.select_by_index(24)
# it uses first `province`
Prov = Select(web.find_element_by_xpath('//*[#id="province"]'))
Prov.select_by_index(5)
Button = web.find_element_by_xpath('//*[#id="popup-subscribe"]/button')
Button.click()
web = webdriver.Firefox()
web.get('https://www.tastyrewards.com/en-ca/contest/fritolaycontest/participate')
# have to go through select your birthday
Start()
# 2 seconds is enough for the website to load
time.sleep(2)
# `find_elements` with `s` - to get second `province`
all_province = web.find_elements_by_xpath('//*[#id="province"]')
second_province = all_province[1]
Prov = Select(second_province)
Prov.select_by_index(5)

Open link in new tab instead of clicking the element found by Class Name| Python

This is the link
https://www.unibet.eu/betting/sports/filter/football/matches
Using selenium driver, I access this link. This is what we have on the page
The actual task for me is to click on each of the match link. I found all those matches by
elems = driver.find_elements_by_class_name('eb700')
When i did this
for elem in elems:
elements
elem.click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
The first time it clicked, loaded new page, went to previous page and then gave the following error
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
I also tried getting HREF attribute from the elem, but it gave None, Is it possible to open the page in a new tab instead of clicking the elem?
You can retry to click on element once again since it is no longer present in DOM.
Code :
driver = webdriver.Chrome("C:\\Users\\**\\Inc\\Desktop\\Selenium+Python\\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.unibet.eu/betting/sports/filter/football/matches")
wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "OK"))).click()
sleep(2)
elements = driver.find_elements(By.XPATH, "//div[contains(#class,'_')]/div[#data-test-name='accordionLevel1']")
element_len = len(elements)
print(element_len)
counter = 0
while counter < element_len:
attempts = 0
while attempts < 2:
try:
ActionChains(driver).move_to_element(elements[counter]).click().perform()
except:
pass
attempts = attempts + 1
sleep(2)
# driver.execute_script("window.history.go(-1)") #may be get team name
#using //div[#data-test-name='teamName'] xpath
sleep(2)
# driver.refresh()
sleep(2)
counter = counter + 1
Since you move to next page, the elements no longer exists in DOM. So, you will get Stale Element exception.
What you can do is when comming back to same page, get all the links again (elems) and use while loop instead of for loop.
elems = driver.find_elements_by_class_name('eb700')
i=0
while i<len(elems):
elems[i].click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
elems = driver.find_elements_by_class_name('eb700')
i++
Other solution is to remain on same page and save all href attributes in a list and then use driver.navigate to open each match link.
matchLinks=[]
elems = driver.find_elements_by_class_name('eb700')
for elem in elems:
matchLinks.append(elem.get_attribute('href')
for match in matchLinks:
driver.get(match)
#do whatever you want to do on match page.

visibility_of_element_located or element_to_be_clickable - Final solution?

I read a little about the functions of checking in selenium whether an element is visible on the page (for the user) so that he can hover or click on this element.
We have options:
1.visibility_of_element_located
An expected condition for checking that, element is located by locator present in the DOM, and element is visible on a web page. Visibility means that the elements are not only displayed but also has a height and width that is greater than 0.
2.element_to_be_clickable
Wait for an element identified by the locator is enabled and visible such that you can click on it. Note that element should be in visible state
And here's the question.
I have a page:
https://tvn24.pl/
in which I have to hover over the Element
"button.account-standard__toggle-button"
that the "frame" unfolds.
In which I have to click on the element `
"div.account-standard__popup button.account-content__button.account-content__button--large"
I would like the click to happen after waiting for the element to be visible to the user - which is only possible after hovering over the first element But :
What if the second button, even though I did not hit the first button, is in the DOM and already has a height and width greater than 0.
The condition will be met even though the user will not actually see it.
Isn't the second way (element_to_be_clickable) better in this case?.
What about the situation when the mouse hover occurs, but for some reason the expanded frame disappears / collapses or something else before selenium finds and clicks on the second button.
Below, there is a screen with hovering the button (upper right corner - a button imitating a human) and below, an expanded frame with a "log in" button
Selection in the inspection is in the second button that appears after hovering over the previous one.
And these graphic values ​​are constant all the time, they do not change whether I hover the mouse on the first button or not.
Unless I have something wrong with this development tool
I wrote this codes before.
But I am afraid that it does not predict that, for example, when you hover over the first button, something will load on the page and the expandable element will disappear before the driver finds the second button and clicks.
I know I can use "wait.untill (ec.visible ...) but I want to avoid a possible time exception.
As you can see in this picture, when we hover on button 1, in the highest div for this Li it appears in the class name "--visible"
def move_to_login_page_from_main_page(driver):
wait = WebDriverWait
CSS_account = "button.account-standard__toggle-button"
CSS_login = "div.account-standard__popup button.account-content__button.account-content__button--large"
CSS_roll_out_frame = "div.account-standard--visible"
expected_url = "account.tvn.pl"
attempts = 0
while attempts <= 10:
account = driver.find_element(By.CSS_SELECTOR, CSS_account)
ac(driver).move_to_element(account).perform()
roll_out_frames = driver.find_elements(By.CSS_SELECTOR, CSS_roll_out_frame)
if len(roll_out_frames) > 0:
log_in_button = roll_out_frames.find_element(By.CSS_SELECTOR, CSS_login)
ac(driver).move_to_element(log_in_button).click(log_in_button).perform()
wait(driver, 10, 1).until(ec.url_contains(expected_url))
break
else:
time.sleep(1)
attempts += 1
def move_to_login_page_from_main_page2(driver):
wait = WebDriverWait
CSS_account = "button.account-standard__toggle-button"
CSS_login = "div.account-standard__popup button.account-content__button.account-content__button--large"
expected_url = "account.tvn.pl"
attempts = 0
while attempts <= 10:
account = driver.find_element(By.CSS_SELECTOR, CSS_account)
ac(driver).move_to_element(account).click(account).perform()
login_button = driver.find_element((By.CSS_SELECTOR, CSS_login))
if login_button.is_enabled():
ac(driver).move_to_element(login_button).click(login_button).perform()
wait(driver, 10, 1).until(ec.url_contains(expected_url))
break
else:
time.sleep(0.5)
attempts += 1
This code shows different conditions of your element:
import time
from selenium import webdriver
from selenium.webdriver import ActionChains as AC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
base_url = 'https://tvn24.pl/'
driver = webdriver.Chrome(executable_path=rf"chromedriver.exe")
driver.maximize_window()
driver.get(base_url)
cookie_button = WebDriverWait(driver, 5, 1).until(EC.visibility_of_element_located((By.ID,'onetrust-accept-btn-handler')))
cookie_button.click()
CSS_account = "button.account-standard__toggle-button"
login_xpath = '//div[#class="account-standard__popup"]//button[#class="account-content__button account-content__button--large"]'
def check_elem_state(xpath):
try:
WebDriverWait(driver, 1, 1).until(EC.presence_of_element_located((By.XPATH, login_xpath)))
print('Element present')
except:
print('Sorry - not present')
try:
WebDriverWait(driver, 1, 1).until(EC.visibility_of_element_located((By.XPATH, login_xpath)))
print('Element visible')
except:
print('Sorry - not visible')
try:
WebDriverWait(driver, 1, 1).until(EC.element_to_be_clickable((By.XPATH, login_xpath)))
print('Element clickable')
except:
print('Sorry - not clickable')
print('-------Without slider--------')
check_elem_state(login_xpath)
login = driver.find_element(By.XPATH, login_xpath)
print("Login is displayed: ", login.is_displayed())
print('-------Showing slider-------')
account = driver.find_element(By.CSS_SELECTOR, CSS_account)
AC(driver).move_to_element(account).click(account).perform()
check_elem_state(login_xpath)
print("Login is displayed: ", login.is_displayed())
login.click()
The output:
-------Without slider--------
Element present
Sorry - not visible
Sorry - not clickable
Login is displayed: False
-------Showing slider-------
Element present
Element visible
Element clickable
Login is displayed: True
This way you can check if is both visible and clickable and perform any action if you close the wait in try expect.
Only remark here is to use proper expected exception - in example any exception will be caught so you need add proper exception.
Edit:
In console you can check it this way:
function isVisible(elem) {
if (!(elem instanceof Element)) throw Error('DomUtil: elem is not an element.');
const style = getComputedStyle(elem);
if (style.display === 'none') {console.log('display'); return false;}
if (style.visibility !== 'visible') {console.log('visibility'); return false;}
if (style.opacity < 0.1) {console.log('opacity'); return false;}
if (elem.offsetWidth + elem.offsetHeight + elem.getBoundingClientRect().height +
elem.getBoundingClientRect().width === 0) {
console.log('client'); return false;
}
const elemCenter = {
x: elem.getBoundingClientRect().left + elem.offsetWidth / 2,
y: elem.getBoundingClientRect().top + elem.offsetHeight / 2
};
if (elemCenter.x < 0) {console.log('x<0'); return false;}
if (elemCenter.x > (document.documentElement.clientWidth || window.innerWidth)) {console.log('width'); return false;}
if (elemCenter.y < 0) {console.log('y<0'); return false;}
if (elemCenter.y > (document.documentElement.clientHeight || window.innerHeight)) {console.log('height'); return false;}
let pointContainer = document.elementFromPoint(elemCenter.x, elemCenter.y);
do {
if (pointContainer === elem) {console.log('point continter'); return true;} else { console.log(pointContainer)}
} while (pointContainer = pointContainer.parentNode);
return false;
}
And trigger function:
isVisible(document.getElementsByClassName('account-content__button--large')[1])
your locator locator for the dropdown element is not unique it is detecting some other element , use :
Now if you use visibility of dropdown content without clicking the dropdown button it will time out.
driver.get('https://tvn24.pl/')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,
'#onetrust-accept-btn-handler'))).click()
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,
".account-standard__toggle-button"))).click()
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,'[class="account-standard__popup"] button[class="account-content__button account-content__button--large"]')))
This will first wait for accept cookies and then click ok , then click hover button , and then click the displayed dropdown button
Strategy to find unique locator:
If you are not able to get a unique property for an element then find an unique sibling or parent or child and refrence the element to it
In this case :
[class="account-standard__popup"]
this parent element is unique refrence you can use to uniquely locate the child element :
button[class="account-content__button account-content__button--large"]
To answer your question:
See the is_clickable method. you can see that it will first check for visibilty.
So if use you element_is_visible, it won't check for is_enabled() else it will check if the button is enabled, meaning it will ensure that the button or element doesn't have the disabled html attribute
IF the button is enabled or disabled using css class and not using HTML disabled attribute, then there is no difference between using visibility and clicakable,
As is_Enabled will not check for element disabled state through css but only through html disable attribute.
class element_to_be_clickable(object):
""" An Expectation for checking an element is visible and enabled such that
you can click it."""
def __init__(self, locator):
self.locator = locator
def __call__(self, driver):
element = visibility_of_element_located(self.locator)(driver)
if element and element.is_enabled():
return element
else:
return False

Python selenium: running into StaleElementReferenceException

I am trying to scrape all job postings for the last 24 hours from Glassdoor and save them to a dictionary.
binary = FirefoxBinary('path_to_firebox_binary.exe')
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_binary=binary, capabilities=cap, executable_path=GeckoDriverManager().install())
base_url = 'https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn' \
'&typedKeyword=data+sc&sc.keyword=data+scientist&locT=C&locId=1154532&jobType= '
driver.get(url=base_url)
driver.implicitly_wait(20)
driver.maximize_window()
WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "div#filter_fromAge>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((
By.XPATH, "//div[#id='PrimaryDropdown']/ul//li//span[#class='label' and contains(., 'Last Day')]"))).click()
# find job listing elements on web page
listings = driver.find_elements_by_class_name("jl")
n_listings = len(listings)
results = {}
for index in range(n_listings):
driver.find_elements_by_class_name("jl")[index].click() # runs into error
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
results[index] = {'title': title, 'company': emp_name, 'description': description}
I keep running into the error message
selenium.common.exceptions.StaleElementReferenceException: Message:
The element reference of is stale; either the element is no longer attached to the
DOM, it is not in the current frame context, or the document has been
refreshed
for the first line inside my for loop. Even if the for loop runs for some number of times, it eventually leads to the exception showing up. I am new to selenium and web scraping, will appreciate any help.
Every time a new post is selected the clicked element is being modified, and therefor the DOM is being refreshed. The change is slow, certainly in comparison to the actions in the loop, so what you want to do is to slow it a little bit. Instead of using fixed sleep you can wait for the changes to occur
Every time you select a posting a new class selected is being added and the style attribute lose it's content. You should wait for this to happen, get the information, and click the next post
wait = WebDriverWait(driver, 20)
for index in range(n_listings - 1):
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.selected:not([style="border-bottom:0"])')))
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name('empInfo.newDetails')
emp = info.find_element_by_class_name('employerName')
if index < n_listings - 1:
driver.find_element_by_css_selector('.selected + .jl').click()
This error means the element you are trying to click on was not found, you have to first make sure the target element exists and then call click() or wrap it in a try/except block.
# ...
results = {}
for index in range(n_listings):
try:
driver.find_elements_by_class_name("jl")[index].click() # runs into error
except:
print('Listing not found, retrying in 1 seconds ...')
time.sleep(1)
continue
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
# ...

Categories