Python Selenium: Loop webpage until element is found - python

I'm trying to write an automation test that will go to a specific webpage, click a link inside and stay in the next page. That next page will not always show (sometimes it will show a connection error), so I'm making it find a specific element (that is suppossed to be in the page I'm looking for) and if the element is found, it stays there, if not, then go back to the beginning of the script (making a loop). I can make it go to the page and click the link, but when the connection error happens it just stays there and won't go back. I'm pretty new in python and selenium so there are probably some things I still not fully understand, but I'm learning with practice. I'm stuck here:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
def driver_init():
driver = webdriver.Chrome(executable_path=r'C:#the path')
driver.get('#the webpage')
element = driver.find_element(By.XPATH, '//*[#id="conten"]/table/tbody/tr[1]/td/a')
element.click()
elt = driver.find_element(By.XPATH,'//*[#id="logint"]/div/a/img')
elt = True
driver_init()
while elt = False:
driver_init()
How can I make it repeat the script if the element is not shown? or how to make it repeat the script if the connection error happens? It's not refreshing the page, it's going back to the beginning, I've tried many things but none helped me, so there may be some text left from those tries. Thanks in advance for the help.

What you're describing is needing to do a condition on a result after page load. Selenium Waiting for an element, will pass or fail, but in my experience, doesn't allow conditional logic to wrap around it. Once it fails to find the element, the test ends. Beyond your example, a case could happen with A/B testing, where 30% of the time a different page loads and you want to divert the test based on different outcomes of what loads.
What you can do create a method holding your main test automation, and a secondary method to handle the waiting or restarting your test. This secondary method uses regex to look for an item in the page source. Regex can have a condition built around it without prematurely ending the test. Here's an example:
import re # Importing the RegEx library
def your_test(driver):
# your selenium test code up to when you need to wait
# I'm passing the defined webdriver along with the text I'm looking for in the page source.
find_or_restart(driver, "Thank You!")
# your test code after something is found found on page....
def find_or_restart(driver, text):
src = driver.page_source # loading page source
text_found = re.search(r'%s' % (text), src) # regex looking for text
print("checking for text on page...")
if text_found:
return
else:
print("not found. restart test...")
your_test(driver)
The connection error should be investigated.
Another aspect to this would be an infinite loop if the fail case happens 100% of the time. You can add a counter, as a global variable, that increments each time the else condition is triggered. Then in the find_or_restart method, you check the value of the counter, and if it's greater than a specified amount, you end the test as a fail.
Sometimes the QA community will argue that such a test, like working around a connection error, is not a valid work around. Instead they will suggest the tester should let it fail so someone can fix the issue you're attempting to work around). That's probably the best practice in general, however, in some cases you might be testing with 3rd parties that you can't control their uptime. Sometimes workarounds are required. Examples for this abound in Telecom (automating a phone call that traverses multiple carriers), or in building bots for games, etc. Anyway, I hope this helps.

First of all you should use selenium wait to wait for element to be present.
Repeating part of code on fail is not a common practice. Why this page is not always loaded properly?

I am not going to code here. I am giving you the solution written and want yourself to code it.
Step 1: create two methods 1st initializes the webdriver. 2nd kills the webdriver
Step 2: Declare FLAG as false and once you initialize web driver and invoke chrome, perform your activity of finding a particular element using WEBDRIVER WAIT.
If found, proceed to next step and set FLAG to true.
Step 3: If FLAG is still false, call the KILL WEBDRIVER method, then call the driver_init(): method

Related

Selenium: Can't find element by class name in any way

I have this problem where I can't access a button trough it's class name in any way I could think of.
This is the HTML:
<button class="expand-button">
<faceplate-number pretty="" number="18591"><!---->18.591</faceplate-number> weitere Kommentare anzeigen
</button>
I tried to access it using:
driver.find_element(By.CLASS_NAME, "expand-button")
But the error tells me that there was no such element.
I also tried X-Path and Css-Selector which both didn't appear to work.
I would be glad for any help!
Kind Regards and Thanks in advance
Eirik
Possible issue 1
It could be because you check before the element is created in the DOM.
One way to solve this problem is by using the waites option like below
driver.implicitly_wait(10)
driver.get("http://somedomain/url_that_delays_loading")
my_dynamic_element = driver.find_element(By.ID, "myDynamicElement")
You can read more about it here: https://www.selenium.dev/documentation/webdriver/waits/#implicit-wait
Another way is by using the Fluent Wait whhich marks the maximum amount of time for Selenium WebDriver to wait for a certain condition (web element) becomes visible. It also defines how frequently WebDriver will check if the condition appears before throwing the “ElementNotVisibleException”.
#Declare and initialise a fluent wait
FluentWait wait = new FluentWait(driver);
#Specify the timout of the wait
wait.withTimeout(5000, TimeUnit.MILLISECONDS);
#Sepcify polling time
wait.pollingEvery(250, TimeUnit.MILLISECONDS);
#Specify what exceptions to ignore
wait.ignoring(NoSuchElementException.class)
#specify the condition to wait on.
wait.until(ExpectedConditions.element_to_be_selected(your_element_here));
you can also read more about that from the official documentation
https://www.selenium.dev/documentation/webdriver/waits/#fluentwait
Possible issue 2
it is also possible that the element might be partially or completely blocked by an element overlaying it. If that is the case, then you will have to dismiss the overlaying element before you will be able to perform any action on your target

Is there a way with python-selenium to wait until all elements of a page has loaded?

I am asking for generally checking if all elements of a page has been loaded. Is there a way to check that basically?
In the concrete example there is a page, I click on some button, and then I have to wait until I click on the 'next' button. However, this 'Next' button is available, selectable and clickable ALL THE TIME. So how to check with selenium that 'state'(?) of a page?
As a reminder: This is a question about selenium and not the quality of the webpage in question....
As your question is about if there is a way with python-selenium to wait until all elements of a page has loaded, the Answer is No.
An Alternate Approach
Fundamentally, you can write a line of code (or implement it through a function) to check for the 'document.readyState' == "complete" as follows :
self.driver.execute_script("return document.readyState").equals("complete"))
But this approach have a drawback that it doesn't accounts for the JavaScript / AJAX Calls to be completed.
Why No
Writing the above mentioned line to wait for Page Loading is implemented by default through Selenium. So rewriting the same is a complete overhead. The client (i.e. the Web Browser) will never return the control back to the WebDriver instance until and unless 'document.readyState' is equal to "complete". Once this condition is fulfilled Selenium performs the next line of code.
It's worth to mention that though the client (i.e. the Web Browser) can return back the control to the WebDriver instance once 'document.readyState' equal to "complete" is achieved, it doesn't guarantees whether all the WebElements on the new HTML DOM are present, visible, interactable and clickable.
So, as per our requirement, if the next *WebElement with which you have to interact is not interactable by the time 'document.readyState' equal to "complete" is achieved, you have to induce WebDriverWait inconjunction with a matching expected_conditions with respect to the individual WebElement. Here are a few of the most used expected_condition:
element_to_be_clickable(locator)
presence_of_element_located(locator)
visibility_of(element)
References
You can find a couple of relevant discussions in:
Do we have any generic function to check if page has completely loaded in Selenium
Selenium IE WebDriver only works while debugging
Selenium how to manage wait for page load?
Reliably determining whether a page has been fully loaded can be challenging. There is no way to know if all the elements have been loaded just like that. You must define some "anchor" points in each page so that as far as you aware, if these elements has been loaded, it is fair to assume the whole page has been loaded. Usually this involves a combination of tests. So for example, you can define that if the below combination of tests passes, the page is considered loaded:
JavaScript document.readyState === 'complete'
"Anchor" elements
All kinds of "spinners", if exist, disappeared.
There is something called the document.readyState which you can retrieve by executing a JavaScript script via Selenium. This doesn't work for dynamically loaded content. It returns one of these three states:
Loading
The document is still being loaded, no css or other resources are available
Interactive
The document has been loaded, no css or other resources are available
Complete
Both the document and the css / other resources are loaded
You're looking for at least Interactive. You can retrieve the state by calling execute_script:
driver.execute_script("return document.readyState")

Possible bottle-neck issue in web-scraping with Python

First of all I apologize for the vague title, but the problem is that I'm not sure what is causing the error.
I'm using Python to extrapolate some data from a website.
The code I created works perfectly when passing one link at the time, but somehow breaks when trying to collect the data from the 8000 pages I have (it actually breaks way before). The process I need to do is this:
Collect all the links from one single page (8000 links)
From each link extrapolate another link contained in an iframe
Scrape the date from the link in 2.
Point 1 is easy and works fine.
Point 2 and 3 works for a while and then I get some errors. Every time at a different point and it's never the same. After some tests, I decided to try a different approach and run my code until point 2 on all the links in 1, trying to collect all the links first. And at this point I found out that, probably, I get the error during this stage.
The code works like this: in a for cycle I pass each item of a list of urls to the function below. It's supposed to search for a link to the Disqus website. There should be only one link and there is always one link. Because with a library as lxml, it's not possible to scan inside the iframe, I use selenium and the ChromeDriver.
def get_url(webpage_url):
chrome_driver_path= '/Applications/chromedriver'
driver = webdriver.Chrome(chrome_driver_path)
driver.get(webpage_url)
iframes=driver.find_elements_by_tag_name("iframe")
list_urls=[]
urls=[]
# collects all the urls of all the iframe tags
for iframe in iframes:
driver.switch_to_frame(iframe)
time.sleep(3)
list_urls.append(driver.current_url)
driver.switch_to_default_content()
driver.quit()
for item in list_urls:
if item.startswith('http://disqus'):
urls.append(item)
if len(urls)>1:
print "too many urls collected in iframes"
else:
url=urls[0]
return url
At the beginning there was no time.sleep and it worked for roughly 30 links. Then I put a time.sleep(2) and it arrived to about 60. Now with time.sleep(3) it works for around 130 links. Of course, this cannot be a solution. The error I get now, it's always the same (index out of range in url=urls[0]), but each time with a different link. If I check my code with the single link where it breaks, the code works, so it can actually find urls there. And of course, sometimes passes a link where it stopped before and it works with no issue.
I suspect I get this because maybe of a time-out, but of course I'm not sure.
So, how can I understand what's the issue, here?
If the problem is that it makes too many requests (even though the sleep), how can I deal with this?
Thank you.
From your description of the problem, it might be that the host throttles your client when you issue too many requests in a given time. This is a common protection againts DoS attacks and ill-behaved robots - like yours.
The clean solution here is to checkout if the site has a robots.txt file and if so parse it and respect the rules - else, set a large enough wait time between two requests so you dont get kicked.
Also you can get quite a few other issues - 404, lost network connection etc - and even load time issues with selenium.webdriver as documented here:
Dependent on several factors, including the OS/Browser combination,
WebDriver may or may not wait for the page to load. In some
circumstances, WebDriver may return control before the page has
finished, or even started, loading. To ensure robustness, you need to
wait for the element(s) to exist in the page using Explicit and
Implicit Waits.
wrt/ your IndexError, you blindly assume that you'll get at least one url (which means at least one iframe), which might not be the case for any of the reasons above (and a few others too). First you want to make sure you properly handle all corner cases, then fix your code so you don't assume that you do have at least one url:
url = None
if len(urls) > 1:
print "too many urls collected in iframes"
elif len(urls) == 0:
url = urls[0]
else:
print "no url found"
Also if all you want is the first http://disqus url you can find, no need to collect them all, then filter them out, then return the first:
def get_url(webpage_url):
chrome_driver_path= '/Applications/chromedriver'
driver = webdriver.Chrome(chrome_driver_path)
driver.get(webpage_url)
iframes=driver.find_elements_by_tag_name("iframe")
# collects all the urls of all the iframe tags
for iframe in iframes:
driver.switch_to_frame(iframe)
time.sleep(3)
if driver.current_url.startswith('http;//disqus'):
return driver.current_url
driver.switch_to_default_content()
driver.quit()
return None # nothing found

Using selenium and python, how to check that even though an element exists but it is not actually visible

I have been using the find_element_by_xpath or cssSelector to locate elements on a page.
Today, I ran into a problem where the xpath of an alert message is present in the HTML but actually not visible on the site. An example is JS will display a banner message when the users enter a page, but disappears after 3s.
The CSS Selector span.greet will always return an element in HTML, but it doesn't necessary mean that it is displaying on the page.
...
<span class="greet">Hello</span>
<span class="greetAgain">Hello Again!</span>
...
I read the documentation on is_Visible() but I'm not quite sure if I understand fully if it could be a solution? If not, is there other methods I could use instead?
I had similar problem, but in my case another element was overlaying the actual element. So I found a solution using javascript executer instead of clicking with webdriver. Waiting for an amount of time can cause random errors during the tests.
Example click OK button:
ok_btn = self.driver.find_element_by_xpath("//button[contains(.,'OK')]")
self.driver.execute_script("arguments[0].click();", ok_btn)
After loading the page via selenium, the element may be visible when you test, but hidden after some time.
A simple way would be to wait for a fixed period of time.
You can use time.sleep to pause your script if you want to wait for the element to hide.
import time
def your_test_function():
# initial tests (to check if elements are displayed)
time.sleep(3)
# delayed tests (to check if something is hidden)
If you need more control, or you want to wait for elements to be displayed / hidden you can use the Webdriver's Wait method.
http://selenium-python.readthedocs.org/en/latest/waits.html?highlight=wait
Also, you should use the is_displayed method to check if the element is visible.
http://selenium-python.readthedocs.org/en/latest/api.html?highlight=is_displayed#selenium.webdriver.remote.webelement.WebElement.is_displayed
You need to explicitly wait for visibility_of_element_located Expected Condition. Or, in other words, wait for element to become visible:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.greet")))
Resist temptation to use time.sleep() - it is very unreliable and error-prompt.

Selenium Webdriver - NoSuchElementExceptions

I am using the python unit testing library (unittest) with selenium webdriver. I am trying to find an element by it's name. About half of the time, the tests throw a NoSuchElementException and the other time it does not throw the exception.
I was wondering if it had to do with the selenium webdriver not waiting long enough for the page to load.
driver = webdriver.WhatEverBrowser()
driver.implicitly_wait(60) # This line will cause it to search for 60 seconds
it only needs to be inserted in your code once ( i usually do it right after creating webdriver object)
for example if your page for some reason takes 30 seconds to load ( buy a new server), and the element is one of the last things to show up on the page, it pretty much just keeps checking over and over and over again if the element is there for 60 seconds, THEN if it doesnt find it, it throws the exception.
also make sure your scope is correct, ie: if you are focused on a frame, and the element you are looking for is NOT in that frame, it will NOT find it.
I see that too. What I do is just wait it out...
you could try:
while True:
try:
x = driver.find_by_name('some_name')
break
except NoSuchElementException:
time.sleep(1)
# possibly use driver.get() again if needed
Also, try updating your selenium to the newest version with pip install --update selenium
I put my money on frame as I had similar issue before :)
Check your html again and check if you are dealing with frame. If it is then switching to correct frame will be able to locate the element.
Python
driver.switch_to_frame("frameName")
Then search for element.
If not, try put wait time as others suggested.
One way to handle waiting for an element to appear is like this:
import selenium.webdriver.support.ui as ui
wait = ui.WebDriverWait(driver,10)
wait.until(lambda driver: driver.find_by_name('some_name') )
elem = driver.find_by_name('some_name')
You are correct that the webdriver is not waiting for the page to load, there is no built-in default wait for driver.get().
To resolve this query you have to define explicit wait. so that till the time when page is loading it will not search any WebElement.
below url help on this.
http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp
You need to have a waitUntil (your element loads). If you are sure that your element will eventually appear on the page, this will ensure that what ever validations will only occur after your expected element is loaded.
I feel it might be synchronisation issue (i.e webdriver speed and application speed is mismatch )
Use Implicit wait:
driver.manage.timeouts.implicitlyWait(9000 TIMEUNITS.miliseconds)
Reference

Categories