Selenium (python) how to best handle a page anomaly - python

I am scraping pages of the Italian website publishing new laws (Gazzetta Ufficiale) to save the final page which holds the law text.
I have a loop that builds a list of the pages to download and am attaching a fully working cose sample which shows the problem I'm running in (the sample is not looped I am just doing two "gets".
What is the best way to handle the rare page which does not show the "Visualizza" (show) button but goes straight to the desired full text?
Hope the code is pretty self explanatory and commented. Thank you in advance and super happy 2022!
import time
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome("/Users/bob/Documents/work/scraper/scrape_gu/chromedriver")
# showing the "normal" behaviour
driver.get(
"https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07300&tipoSerie=serie_generale&tipoVigenza=originario"
)
# this page has a "Visualizza" button, find it and click it.
bottoni = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located(
(By.XPATH, '//*[#id="corpo_export"]/div/input[1]')
)
)
time.sleep(5) # just to see the "normal" result with the "Visualizza" button
bottoni[0].click() # now click it and this shows the desired final webpage
time.sleep(5) # just to see the "normal" desired result
# but unfortunately some pages directly get to the end result WITHOUT the "Visualizza" button.
# as an example see the following get
# showing the "normal" behaviour
driver.get(
"https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07249&tipoSerie=serie_generale&tipoVigenza=originario"
) # get a law page
time.sleep(
5
) #  as you can see we are now on the final desired full page WITHOUT the Visualizza button
# hence the following code, identical to that above will fail and timeout
bottoni = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located(
(By.XPATH, '//*[#id="corpo_export"]/div/input[1]')
)
)
time.sleep(5) # just to see the result
bottoni[0].click() # and this shows the desired final webpage
# and the program abends with the following message
# File "/Users/bob/Documents/work/scraper/scrape_gu/temp.py", line 33, in <module>
# bottoni = WebDriverWait(driver, 10).until(
# File "/Users/bob/opt/miniconda3/envs/scraping/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 80, in until
# raise TimeoutException(message, screen, stacktrace)
# selenium.common.exceptions.TimeoutException: Message:

Catch the exception with a try and except block - If there is no button extract the text directly - Handling Exeptions
...
urls = [
'https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07300&tipoSerie=serie_generale&tipoVigenza=originario',
'https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07249&tipoSerie=serie_generale&tipoVigenza=originario'
]
data = []
for url in urls:
driver.get(url)
try:
bottoni = WebDriverWait(driver,1).until(
EC.element_to_be_clickable(
(By.XPATH, '//input[#value="Visualizza"]')
)
)
bottoni.click()
except TimeoutException:
print('no bottoni -')
finally:
data.append(driver.find_element(By.XPATH, '//body').text)
driver.close()
print(data)
...

First, using selenium for this task is overkill.
You'd be able to do the same thing using requests or aiohttp coupled with beautifulsoup, except that would be much faster and easier to code.
Now to get back to your question, there are a few solutions.
The simplest would be :
Catch the timeout exception : if the button isn't found, then go straight to parsing the law.
Check if the button is present : !driver.findElements(By.id("corpo_export")).isEmpty(), before either clicking on it, or parsing the web page.
But then again, you'd have a much easier time getting rid of selenium and using beautifulsoup instead.

Related

Selenium. Unable to locate element from the html website

Here's the link of the website I'm trying to scrape (I'm training for the moment, nothing fancy):
link
Here's my script, he's quite long but nothing too complicated :
from selenium import webdriver
if __name__ == "__main__":
print("Web Scraping application started")
PATH = "driver\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')
driver = webdriver.Chrome(options=options, executable_path=PATH)
driver.get('https://fr.hotels.com/')
driver.maximize_window()
destination_location_element = driver.find_element_by_id("qf-0q-destination")
check_in_date_element = driver.find_element_by_id("qf-0q-localised-check-in")
check_out_date_element = driver.find_element_by_id("qf-0q-localised-check-out")
search_button_element = driver.find_element_by_xpath('//*[#id="hds-marquee"]/div[2]/div[1]/div/form/div[4]/button')
print('Printing type of search_button_element')
print(type(search_button_element))
destination_location_element.send_keys('Paris')
check_in_date_element.clear()
check_in_date_element.send_keys("29/05/2021")
check_out_date_element.clear()
check_out_date_element.send_keys("30/05/2021")
close_date_window = driver.find_element_by_xpath('/html/body/div[7]/div[4]/button')
print('Printing type of close_date_window')
print(type(close_date_window))
close_date_window[0].click()
search_button_element.click()
time.sleep(10)
hotels = driver.find_element_by_class_name('hotel-wrap')
print("\n")
i = 1
for hotel in hotels:
try:
print(hotel.find_element_by_xpath('//*[#id="listings"]/ol/li['+str(i)+']/article/section/div/h3/a').text)
print(hotel.find_element_by_xpath('//*[#id="listings"]/ol/li[' + str(i) + ']/article/section/div/address/span').text)
except Exception as ex:
print(ex)
print('Failed to extract data from element.')
i = i +1
print('\n')
driver.close()
print('Web Scraping application completed')
And here's the error I get :
File "hotelscom.py", line 21, in <module>
destination_location_element = driver.find_element_by_id("qf-0q-destination")
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="qf-0q-destination"]"}
(Session info: chrome=90.0.4430.85)
Any idea how to fix that ? I don't understand why it get me this error because in the html code, there is this syntax. But i guess I'm wrong.
You have multiple problems with your code and the site.
SITE PROBLEMS
1 The site is located on multiple servers and different servers have different html code. I do not know if it depends on location or not.
2 The version I have solution for has few serious bugs (or maybe those are features). Among them:
When you press Enter it starts hotels search when a date field is opened and and you just want to close this date field. So, it is a problem to close input fields in a traditional way.
clear() of Selenium does not work as it is supposed to work.
BUGS IN YOUR CODE
1 You are defining window size in options and you are maximizing the window immediately after site is opened. Use only one option
2 You are entering dates like "29/05/2021", but sites recognises formats only like: "05/30/2021". It is a big difference
3 You are not using any waits and they are extremely important.
4 Your locators are wrong and unstable. Even locators with id did not always work for me because if you will make a search, there are two elements for some of them. So I replaces them with css selectors.
Please note that my solution works only for an old version of site. If you want to a specific version to be opened you will need either:
Get the site by a direct ip address, like driver.get('site ip address')
Implement a strategy in your framework which recognises which site version is opened and applies inputs depending on it.
SOLUTION
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
if __name__ == "__main__":
print("Web Scraping application started")
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')
driver = webdriver.Chrome(options=options, executable_path='/snap/bin/chromium.chromedriver')
driver.get('https://fr.hotels.com/')
wait = WebDriverWait(driver, 15)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#qf-0q-destination")))
destination_location_element = driver.find_element_by_css_selector("#qf-0q-destination")
destination_location_element.send_keys('Paris, France')
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".widget-autosuggest.widget-autosuggest-visible table tr")))
destination_location_element.send_keys(Keys.TAB) # workaround to close destination field
driver.find_element_by_css_selector(".widget-query-sub-title").click()
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".widget-query-group.widget-query-destination [aria-expanded=true]")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#qf-0q-localised-check-in")))
check_in_date_element = driver.find_element_by_css_selector("#qf-0q-localised-check-in")
check_in_date_element.send_keys(Keys.CONTROL, 'a') # workaround to replace clear() method
check_in_date_element.send_keys(Keys.DELETE) # workaround to replace clear() method
# check_in_date_element.click()
check_in_date_element.send_keys("05/30/2021")
# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#qf-0q-localised-check-out")))
check_out_date_element = driver.find_element_by_id("qf-0q-localised-check-out")
check_out_date_element.click()
check_out_date_element.send_keys(Keys.CONTROL, 'a')
check_out_date_element.send_keys(Keys.DELETE)
check_out_date_element.send_keys("05/31/2021")
driver.find_element_by_css_selector(".widget-query-sub-title").click() # workaround to close end date
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hds-marquee button"))).click()
Spent on this few hours, the task seemed just interesting for me.
It works for this UI:
The code can still be optimised. It's up to you.
UPDATE:
I found out that the site has at least three home pages with three different Destination and other fields locators.
The easiest workaround that came into my mind is something like this:
try:
element = driver.find_element_by_css_selector("#qf-0q-destination")
if element.is_displayed():
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#qf-0q-destination")))
destination_location_element = driver.find_element_by_css_selector("#qf-0q-destination")
print("making input to Destination field of site 1")
destination_location_element.send_keys('Paris, France')
# input following data
except:
print("Page 1 not found")
try:
element = driver.find_element_by_css_selector("input[name=q-destination-srs7]")
if element.is_displayed():
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name=q-destination-srs7]")))
destination_location_element = driver.find_element_by_css_selector("input[name=q-destination-srs7]")
print("making input to Destination field of site 2")
destination_location_element.send_keys('Paris, France')
# input following data
except:
print("Page 2 is not found")
try:
element = driver.find_element_by_css_selector("form[method=GET]>div>._1yFrqc")
if element.is_displayed():
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "form[method=GET]>div>._1yFrqc")))
destination_location_element = driver.find_element_by_css_selector("form[method=GET]>div>._1yFrqc")
print("making input to Destination field of site 3")
destination_location_element.send_keys('Paris, France')
# input following data
except:
print("Page 3 is not found")
But the best solution would be to have a direct access to a specific server that has only one version available.
Please also note that if you access the site by a direct link for France: https://fr.hotels.com/?pos=HCOM_FR&locale=fr_FR your input dates will be as you initially specified, for example 30/05/2021.
Try this
driver.find_element_by_xpath(".//div[contains(#class,'destination')]/input[#name='q-destination']")
Also please add wait after you maximize the window
You are missing a wait / sleep before finding the element.
So, just add this:
element = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.ID, "qf-0q-destination")))
element.click()
to use this you will have to use the following imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as E

Selenium Not Finding Element Present in HTML Even After Waiting for DOM to update

I am trying to scrape information on a website where the information is not immediately present. When you click a certain button, the page begins to load new content on the bottom of the page, and after it's done loading, red text shows up as "Assists (At Least)". I am able to find the first button "Go to Prop builder", which doesn't immediately show up on the page, but after the script clicks the button, it times out when trying to find the "Assists (At Least)" text, in spite of the script sleeping and being present on the screen.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://www.bovada.lv/sports/basketball/nba')
# this part succeeds
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, "//span[text()='Go to Prop builder']")
)
)
element.click()
time.sleep(5)
# this part fails
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
time.sleep(2)
innerHTML = driver.execute_script('return document.body.innerHTML')
driver.quit()
soup = BeautifulSoup(innerHTML, 'html.parser')
The problem is the Assist element is under a frame. You need to switch to the frame like this:
frame = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME,"player-props-frame")))
driver.switch_to.frame(frame)
Increase the timeout to confirm the timeout provided is correct, You can also confirm using debug mode. If still issue persist, please check "Assists (At Least)" element do not fall under any frame.
You can also share the DOM and proper error message if issue not resolved.
I have a couple of suggestions you could try,
Make sure that the content loaded at the bottom of the is not in a frame. If it is, you need to switch to the particular frame
Check the XPath is correct, try the XPath is matching from the Developer Console
Inspect the element from the browser, once the Developer console is open, press CTRL +F and then try your XPath. if it's not highlighting check frames
Check if there is are any iframes in the page, search for iframe in the view page source, and if you find any for that field which you are looking for, then switch to that frame first.
driver.switch_to.frame("name of the iframe")
Try adding a re-try logic with timeout, and a refresh button if any on the page
st = time.time()
while st+180>time.time():
try:
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
except:
pass
The content you want is in an iFrame. You can access it by switching to it first, like this:
iframe=driver.find_element_by_css_selector('iframe[class="player-props-frame"]')
driver.switch_to.frame(iframe)
Round brackets are the issue here (at least in some cases...). If possible, use .contains selector:
//*[contains(text(),'Assists ') and contains(text(),'At Least')]

selenium python product load button not working

This page has a total of 790 products and I write selenium code to automatically click on the product load button until it will finish loading all 790 products. Unfortunately, my code is not working and getting an error. here is my full code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
driver = webdriver.Chrome()
driver.maximize_window()
url ='https://www.billigvvs.dk/maerker/grohe/produkter?min_price=1'
driver.get(url)
time.sleep(5)
#accept cookies
try:
driver.find_element_by_xpath("//button[#class='coi-banner__accept']").click()
except:
pass
print('cookies not accepted')
# Wait 20 seconds for page to load.
timeout = 20
try:
WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='productbox__info__name']")))
except TimeoutException:
print("Timed out waiting for page to load")
browser.quit()
#my page load button not working. I want to load all 790 product in this page
products_load_button = driver.find_element_by_xpath("//div[#class='filterlist__button']").click()
The error that I am getting:
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='filterlist__button']"}
(Session info: chrome=87.0.4280.88)
The error message saying Unable to locate element but see the picture which saying I am selecting the right element.
You are missing an extra space at the end, try with this:
products_load_button = driver.find_element_by_xpath("//div[#class='filterlist__button ']").click()
when you work with selectors is always a good practice to copy and paste directly from the page, that will save a lot of headaches in the future.
Edit:
The while loop to check if all the elements are loaded looks similar to this:
progress_bar_text = driver.find_element_by_css("div.filterlist__pagination__text").text
# From here you could extract the total items and the loaded items
# Note: I am doing this because I don't have access to the page, probably
# there is a better way to found out if the items are loaded taking
# taking a look into the attributes of the progressBar
total_items = int(progress_bar_text.split()[4])
loaded_items = int(progress_bar_text.split()[1])
while loaded_items < total_items:
# Click the product load button until the products are loaded
product_load_button.click()
# Get the progress bar text and updates the loaded_items count
progress_bar_text = driver.find_element_by_css("div.filterlist__pagination__text").text
loaded_items = int(progress_bar_text.split()[1])
This is a very simple example and does not consider a lot of scenarios that you will need to handle to make it stable, some of them are:
The elements might disappear or reload after you click the products_load_button. For this I'll recommend that you take a look to explicit waits in selenium docs.
Is possible that the progress bar could disappear/hide after the load is complete.

Python-Selenium scroll until certain text

I'm trying to have selenium scroll facebook page until certain text then get the HTML tags from that page. I'm trying to facebook post date text and have Seleinum scroll until that page. This code doesn't throw me error but doesn't does the task either. How can I achieve this? Right now it keeps scrolling and doesn't stop.
I'm just trying to scroll the page until the text 'Oct 5th' is visible.
driver.get("https://www.facebook.com/search/latest/?q=%23blacklivesmatter")
sleep(4)
wait = WebDriverWait(driver, 10)
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(text(), 'Oct 5th')]")))
html = driver.page_source
soup = BeautifulSoup(html)
except TimeoutException:
break
Edit: We need to look for the presence of an element instead of visibility.
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from time import sleep
driver = webdriver.Chrome()
driver.get("https://www.facebook.com/search/latest/?q=%23blacklivesmatter")
wait = WebDriverWait(driver, 10)
find_elem = None
scroll_from = 0
scroll_limit = 3000
while not find_elem:
sleep(2)
driver.execute_script("window.scrollTo(%d, %d);" %(scroll_from, scroll_from+scroll_limit))
scroll_from += scroll_limit
try:
find_elem = wait.until(EC.presence_of_element_located((By.XPATH, "//*[contains(text(), 'Oct 5th')]")))
except TimeoutException:
pass
driver.close()
First of all, if this text you are looking for is somewhere on the page, even if it is not immediately visible, it should still be visible in the HTML directly, without the need to scroll. The scrolling is only required when the page needs to be refreshed to load additional content that was not available before.
Now, I would suggest changing the following in your approach:
First of all, if the page does require to load some data that was unavailable before the scroll, you should give it enough time to do so. If you scroll and look for the text too quickly, it won't have enough time to get the updated HTML and you will basically just query the same DOM each time. Now given that you don't necessarily know when your text will appear, you will have to wait for a constant hard coded period each time. Few seconds should be enough, at least initially just to proved that it works.
Just to exclude possible issues with using wait.until, try looking for this text directly in the HTML source. You can change it later and use wait.until when you ensure that the rest of your script works properly.

Wait until page is loaded with Selenium WebDriver for Python

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.
The webdriver will wait for a page to load by default via .get() method.
As you may be looking for some specific element as #user227215 said, you should use WebDriverWait to wait for an element located in your page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.
Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
This matches the example in the documentation. Here is a link to the documentation for By.
Find below 3 methods:
readyState
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of method:
#contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.
As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here.
According to Web Scraping with Python by Ryan Mitchell:
ID
Used in the example; finds elements by their HTML id attribute
CLASS_NAME
Used to find elements by their HTML class attribute. Why is this
function CLASS_NAME not simply CLASS? Using the form object.CLASS
would create problems for Selenium's Java library, where .class is a
reserved method. In order to keep the Selenium syntax consistent
between different languages, CLASS_NAME was used instead.
CSS_SELECTOR
Finds elements by their class, id, or tag name, using the #idName,
.className, tagName convention.
LINK_TEXT
Finds HTML tags by the text they contain. For example, a link that
says "Next" can be selected using (By.LINK_TEXT, "Next").
PARTIAL_LINK_TEXT
Similar to LINK_TEXT, but matches on a partial string.
NAME
Finds HTML tags by their name attribute. This is handy for HTML forms.
TAG_NAME
Finds HTML tags by their tag name.
XPATH
Uses an XPath expression ... to select matching elements.
From selenium/webdriver/support/wait.py
driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_id("someId"))
On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True
Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.
Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue
Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.
import time
from selenium import webdriver
def page_has_loaded(driver, sleep_time = 2):
'''
Waits for page to completely load by comparing current page hash values.
'''
def get_page_hash(driver):
'''
Returns html dom hash
'''
# can find element by either 'html' tag or by the html 'root' id
dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
# dom = driver.find_element_by_id('root').get_attribute('innerHTML')
dom_hash = hash(dom.encode('utf-8'))
return dom_hash
page_hash = 'empty'
page_hash_new = ''
# comparing old and new page DOM hash together to verify the page is fully loaded
while page_hash != page_hash_new:
page_hash = get_page_hash(driver)
time.sleep(sleep_time)
page_hash_new = get_page_hash(driver)
print('<page_has_loaded> - page not loaded')
print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))
How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"
You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")
use this in code :
from selenium import webdriver
driver = webdriver.Firefox() # or Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.......")
or you can use this code if you are looking for a specific tag :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox() #or Chrome()
driver.get("http://www.......")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "tag_id"))
)
finally:
driver.quit()
Very good answers here. Quick example of wait for XPATH.
# wait for sizes to load - 2s timeout
try:
WebDriverWait(driver, 2).until(expected_conditions.presence_of_element_located(
(By.XPATH, "//div[#id='stockSizes']//a")))
except TimeoutException:
pass
I struggled a bit to get this working as that didn't worked for me as expected. anyone who is still struggling to get this working, may check this.
I want to wait for an element to be present on the webpage before proceeding with my manipulations.
we can use WebDriverWait(driver, 10, 1).until(), but the catch is until() expects a function which it can execute for a period of timeout provided(in our case its 10) for every 1 sec. so keeping it like below worked for me.
element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())
here is what until() do behind the scene
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)
If you are trying to scroll and find all items on a page. You can consider using the following. This is a combination of a few methods mentioned by others here. And it did the job for me:
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem1 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_1 = len(elem1)
print(f"A list Length {len_elem_1}")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem2 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_2 = len(elem2)
print(f"B list Length {len_elem_2}")
if len_elem_1 == len_elem_2:
print(f"final length = {len_elem_1}")
break
except TimeoutException:
print("Loading took too much time!")
selenium can't detect when the page is fully loaded or not, but javascript can. I suggest you try this.
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
this will execute javascript code instead of using python, because javascript can detect when page is fully loaded, it will show 'complete'. This code means in 100 seconds, keep tryingn document.readyState until complete shows.
nono = driver.current_url
driver.find_element(By.XPATH,"//button[#value='Send']").click()
while driver.current_url == nono:
pass
print("page loaded.")

Categories