selenium wait until html is reloaded

selenium wait until html is reloaded - python

I have a script with python and selenium to scrape google results.. It works, but I'm looking for a better solution to wait until all 100 search results are fetched
I use this solution to wait until the search is done
driver.wait.until(EC.presence_of_element_located(
(By.ID, 'resultStats')))
This works, but I need to get 100 search results so I do this
driver.get(driver.current_url+'&num=100')
But now its not possible to re-use this line because the element ID is already written to the page..
driver.wait.until(EC.presence_of_element_located(
(By.ID, 'resultStats')))
Instead I use this solution, but its not a consistent solution (if the request takes more than 5 secs)
time.sleep(5)
code
url = 'https://www.google.com'
driver.get(url)
try:
box = driver.wait.until(EC.presence_of_element_located(
(By.NAME, 'q')))
box.send_keys(query.decode('utf-8'))
button = driver.wait.until(EC.element_to_be_clickable(
(By.NAME, 'btnG')))
button.click()
except TimeoutException:
error('Box or Button not found in google.com')
try:
driver.wait.until(EC.presence_of_element_located(
(By.ID, 'resultStats')))
driver.get(driver.current_url+'&num=100')
# Need a better solution to wait until all results are loaded
time.sleep(5)
print driver.find_element_by_tag_name('body').get_attribute('innerHTML').encode('utf-8')
except TimeoutException:
error('No results returned by Google. Could be HTTP 503 response')

You are absolutely right that time.sleep(5) is not a reliable and good way to wait for something on the page. You would need to use WebDriverWait class and a specific condition to wait for.
In this case, I'd wait for the count of elements with class="g" (which represents a search result) would be greater or equal to 100 via a custom Expected Condition:
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class wait_for_n_elements(object):
def __init__(self, locator, count):
self.locator = locator
self.count = count
def __call__(self, driver):
try:
count = len(EC._find_elements(driver, self.locator))
return count >= self.count
except StaleElementReferenceException:
return False
Usage:
wait = WebDriverWait(driver, 10)
wait.until(wait_for_n_elements((By.CSS_SELECTOR, ".g"), 100)

Related

How to loop code until it works in Python?

I have a long code, here is a small excerpt from it. You constantly need to find elements, insert data and click buttons. Sometimes something fails to load and errors pop up. Is it possible to have Python try these commands until it succeeds? I can't use
time.sleep()
with a long delay as it will greatly increase the execution time and even that doesn't always help(
start = driver.find_element(By.CSS_SELECTOR, "SELECTOR")
start.click()
time.sleep(1)
start2 = driver.find_element(By.CSS_SELECTOR, "SELECTOR")
start2.click()
time.sleep(1)

If you use a method that waits for an element to be clickable first, then you can set a default timeout, and then use that, as shown below with this partial code example that uses WebDriverWait:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
def wait_for_element_clickable(
driver, selector, by="css selector", timeout=10
):
try:
return WebDriverWait(driver, timeout).until(
EC.element_to_be_clickable((by, selector))
)
except Exception:
raise Exception(
"Element {%s} was not visible/clickable after %s seconds!"
% (selector, timeout)
)
# ...
element = wait_for_element_clickable(driver, "button#id")
element.click()

Are there any methods in Selenium for waiting until a page is fully loaded?

I am trying to find a method that waits when all elements on the page will be fully loaded.
I found out a good idea and it is working very well:
def waiting_new_page(link: WebElement) -> None:
waiting_update = True
while waiting_update:
try:
link.find_element(By.ID, "does not matter")
except NoSuchElementException:
sleep(1)
except StaleElementReferenceException:
waiting_update = False
options = webdriver.ChromeOptions()
options.add_argument("window-size=800,600")
browser = webdriver.Chrome(options=options)
link = browser.find_element(
By.NAME,
"thePage:SiteTemplate:theForm:Continue"
)
link.click()
waiting_new_page(link)
Are there any ready methods in Selenium to wait until page is fully loaded?

You can use WebDriverWait function in Selenium
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
WebDriverWait(browser, timeout=100).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'body')))
Or instead of body you can put your selector you want wait until to load.

Can't fetch the texts from a webpage

I've created a script using python and selenium to get all the text available out there in the following link. The webpage has got lazyloading method active and that is why more content become visible upon each scrolling. My script can handle that too.
However, the problem is when my script makes the webpage exhaust its content by reaching the bottom, it stucks right there. Once it can breaks out of the loop, I can fetch the content. How can I break out of the loop?
I know .LoadingDots is always there. And that is the only reason I can't find any logic to break the loop.
Link to that site
Here is what I've tried so far: (couldn't get rid of the loop)
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get("https://www.quora.com/topic/American-Football")
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".LoadingDots")))
except Exception: break
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para"))):
print(item.text)
driver.quit()
I know I can solve the issue if I comply with the following:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get("https://www.quora.com/topic/American-Football")
last_len = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para"))))
while True:
for load_more in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "a[id$='_more']"))):
driver.execute_script("arguments[0].click();",load_more)
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait.until(lambda driver: len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para")))) > last_len)
items = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para")))
last_len = len(items)
except TimeoutException: break
for item in items:
print(item.text)
driver.quit()
My question is: how can i fetch the content from that page exhausting all the scrolls using the way I tried with my first script making use of .LoadingDots?

When the page is scrolled to the button the element with classes .LoadingDots.regular remains the same, but its parent element adds new class hidden. You can check if the class was added using get_attribute function. You can also locate it directly with the class spinner_display_area
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
loading_dots = driver.find_element_by_class_name('spinner_display_area')
if 'hidden' in loading_dots.get_attribute('class'):
break;

Your script doesn't work as expected because (By.CSS_SELECTOR, ".LoadingDots") selector returns this element <div class="LoadingDots tiny"> and it is always hidden so your expectation of its invisibility always returns True and loop cannot be broken.
You need to check another element with "LoadingDots" class name: <div class="LoadingDots regular"> and the logic should be following:
Scroll page down
Wait for loading dots to appear (start loading more content)
Wait for loading dots to disappear (loading more content is done)
If after page scrolled we see no dots - break the loop
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 5)
driver.get("https://www.quora.com/topic/American-Football")
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".LoadingDots.regular")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".LoadingDots.regular")))
except Exception: continue
else: break
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ui_qtext_rendered_qtext .ui_qtext_para"))):
print(item.text)
driver.quit()
BUT! Note that I've posted this script just to point on reason why your script is not working... It's not really efficient as in case content loaded too fast (possibility is quite low, but...) script might not catch the moment when loading dots appeared and you'll not get all required content.
So #Guy solution seem to be more reliable (+1)

Telling selenium to stop blocking after expected condition

I am interested in one element, let's call it
<div class="ofInterest" some-attr="dataIReallyWant"></div>
When I switch off js in firefox, this element does not exist. With javascript it does. I could not tell how it was being generated but my guess is that there is an ajax call which returns a js file which executes this javascript.
I am using selenium but it is very slow. I want to tell Selenium this:
Wait for this element to load, i.e something like EC.visibility_of_element_located((By.CSS, '.ofInterest'))
once you detect said element, stop blocking the code and don't download any further so don't waste my bandwidth

Simply wait for the element to exist in the DOM, then either quit/close the browser or execute some JavaScript to stop the page from loading:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until( #10 second timeout.
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
# OR
#driver.execute_script("window.stop();")
More information can be found here.

Explicit waits were made exactly for what you are describing:
An explicit waits is code you define to wait for a certain condition
to occur before proceeding further in the code. The worst case of this
is time.sleep(), which sets the condition to an exact time period to
wait.
In the worst case scenario, you would wait X amount of seconds that you've passed to the WebDriverWait, 10 seconds in this case:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
But, if the element is found earlier, it would give you the element and stop blocking the execution. By default, it checks for the expected condition every 500ms.
FYI, under-the-hood, it is just a while True: loop:
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)

Wait until page is loaded with Selenium WebDriver for Python

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.

The webdriver will wait for a page to load by default via .get() method.
As you may be looking for some specific element as #user227215 said, you should use WebDriverWait to wait for an element located in your page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.

Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
This matches the example in the documentation. Here is a link to the documentation for By.

Find below 3 methods:
readyState
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of method:
#contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.

As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here.
According to Web Scraping with Python by Ryan Mitchell:
ID
Used in the example; finds elements by their HTML id attribute
CLASS_NAME
Used to find elements by their HTML class attribute. Why is this
function CLASS_NAME not simply CLASS? Using the form object.CLASS
would create problems for Selenium's Java library, where .class is a
reserved method. In order to keep the Selenium syntax consistent
between different languages, CLASS_NAME was used instead.
CSS_SELECTOR
Finds elements by their class, id, or tag name, using the #idName,
.className, tagName convention.
LINK_TEXT
Finds HTML tags by the text they contain. For example, a link that
says "Next" can be selected using (By.LINK_TEXT, "Next").
PARTIAL_LINK_TEXT
Similar to LINK_TEXT, but matches on a partial string.
NAME
Finds HTML tags by their name attribute. This is handy for HTML forms.
TAG_NAME
Finds HTML tags by their tag name.
XPATH
Uses an XPath expression ... to select matching elements.

From selenium/webdriver/support/wait.py
driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_id("someId"))

On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True

Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.

Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue

Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.
import time
from selenium import webdriver
def page_has_loaded(driver, sleep_time = 2):
'''
Waits for page to completely load by comparing current page hash values.
'''
def get_page_hash(driver):
'''
Returns html dom hash
'''
# can find element by either 'html' tag or by the html 'root' id
dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
# dom = driver.find_element_by_id('root').get_attribute('innerHTML')
dom_hash = hash(dom.encode('utf-8'))
return dom_hash
page_hash = 'empty'
page_hash_new = ''
# comparing old and new page DOM hash together to verify the page is fully loaded
while page_hash != page_hash_new:
page_hash = get_page_hash(driver)
time.sleep(sleep_time)
page_hash_new = get_page_hash(driver)
print('<page_has_loaded> - page not loaded')
print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))

How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"

You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")

use this in code :
from selenium import webdriver
driver = webdriver.Firefox() # or Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.......")
or you can use this code if you are looking for a specific tag :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox() #or Chrome()
driver.get("http://www.......")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "tag_id"))
)
finally:
driver.quit()

Very good answers here. Quick example of wait for XPATH.
# wait for sizes to load - 2s timeout
try:
WebDriverWait(driver, 2).until(expected_conditions.presence_of_element_located(
(By.XPATH, "//div[#id='stockSizes']//a")))
except TimeoutException:
pass

I struggled a bit to get this working as that didn't worked for me as expected. anyone who is still struggling to get this working, may check this.
I want to wait for an element to be present on the webpage before proceeding with my manipulations.
we can use WebDriverWait(driver, 10, 1).until(), but the catch is until() expects a function which it can execute for a period of timeout provided(in our case its 10) for every 1 sec. so keeping it like below worked for me.
element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())
here is what until() do behind the scene
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)

If you are trying to scroll and find all items on a page. You can consider using the following. This is a combination of a few methods mentioned by others here. And it did the job for me:
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem1 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_1 = len(elem1)
print(f"A list Length {len_elem_1}")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem2 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_2 = len(elem2)
print(f"B list Length {len_elem_2}")
if len_elem_1 == len_elem_2:
print(f"final length = {len_elem_1}")
break
except TimeoutException:
print("Loading took too much time!")

selenium can't detect when the page is fully loaded or not, but javascript can. I suggest you try this.
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
this will execute javascript code instead of using python, because javascript can detect when page is fully loaded, it will show 'complete'. This code means in 100 seconds, keep tryingn document.readyState until complete shows.

nono = driver.current_url
driver.find_element(By.XPATH,"//button[#value='Send']").click()
while driver.current_url == nono:
pass
print("page loaded.")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

selenium wait until html is reloaded - python

Related

How to loop code until it works in Python?

Are there any methods in Selenium for waiting until a page is fully loaded?

Can't fetch the texts from a webpage

Telling selenium to stop blocking after expected condition

Wait until page is loaded with Selenium WebDriver for Python

Categories

Resources