How to periodically re-check a webpage using selenium in python - python

I am new to selenium in python (and all web-interface applications of python) and I have a task to complete for my present internship.
My script successfully navigates to an online database and inputs information from my data tables, but then the webpage in question takes anywhere from 30 seconds to several minutes to compute an output.
How do I go about instructing python to re-check the page every 30 seconds until the output appears so that I can parse it for the data I need? For instance, which functions might be I start with?
This will be part of a loop repeated for over 200 entries, and hundreds more if I am successful so it is worth my time to automate it.
Thanks

You should use Seleniums Waits as pointed by G_M and Sam Holloway.
One which I most use is the expected_conditions:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
It will wait until there is an element with id "myDynamicElement" and then execute the try block, which should contain the rest of your work.
I prefer to use the the class By.XPATH, but if you use By.XPATH with the method presence_of_element_located add another () so it will be the required tuple as noted in this answer:
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, '//button[contains(text(),"Some text")]')
driver.find_element(By.XPATH, '//div[#id="id1"]')
driver.find_elements(By.XPATH, '//a')
The easiest way to find (for me) the XPATH of an element is going to the developer mode in chrome (F12), pressing ctrl+F, and using the mouse with inspect, trying to compose the proper XPATH, which will be specific enough to find just the expected element, or the least number of elements as possible.
All the examples are from (or based) the great selenium documentation.

If you just want to space out checks, the time.sleep() function should work.
However, as G_M's comment says, you should look into Selenium waits. Think about this: is there an element on the page that will indicate that the result is loaded? If so, use a Selenium wait on that element to make sure your program is only pausing until the result is loaded and not wasting any time afterwards.

Related

WebScraping through multiple sites with Selenium

I'm using Selenium in a project that consists of opening a range of websites, that contains pretty much the same structure, collecting data in each site and storing it.
The problem I ran into is that some of the sites I wan't to access are unavailable, and when the program get to one of those it just stops.
What I want it to do, is to skip those and follow on with the next iterations, but so far my tries have been obsolete... In my latest try I used the method is_displayed(), but apparently it will only tell me if an element is visible or not, instead of telling me if it's present or not.
if driver.find_element_by_xpath('//*[#id="main-2"]/div[2]/div[1]/div[1]/div/div[1]/strong').is_displayed():
The example above doesn't work, because the driver needs to find the element before telling me if it visible or not, but the element is simply not there.
Have any of you dealt with something similar?
How one of the sites looks like normally
How it looks like when it is unavailable
You can use Selenium expected conditions waiting for element presence.
I'm just giving an example below.
I have defined the timeout for 5 seconds here, but you can use any timeout value.
Also, your element locator looks bad.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element_xpath_locator = '//*[#id="main-2"]/div[2]/div[1]/div[1]/div/div[1]/strong'
wait = WebDriverWait(browser, 5)
wait.until(EC.presence_of_element_located((By.XPATH, element_xpath_locator)))

Python, Selenium and Chrome - How do can I detect the end of a page with dynamically generated content?

I have gone through existing questions and google results of a similar nature, every solution posed has not worked for me within the particular website I am currently scraping.
https://dutchie.com/embedded-menu/revolutionary-clinics-somerville/menu
I am sending page down keys to the body element, which loads each item to be scraped. I have two issues with this, first I am unable to detect when the scrolling has stopped. Second, I have to manually click the browser window as it opens to allow the keys to be sent. I am not sure how to mimic this same focus giving behavior via code.
elem = driver.find_element_by_tag_name("body")
elem.send_keys(Keys.PAGE_DOWN)
I have tried the following, in many different iterations and the number printed never charged regardless of how far down the page I am or if I used innerHeight, or body instead of documentElement.
height = driver.execute_script("return document.documentElement.scrollHeight")
If I attempt to scroll down the page using a similar approach, this page does not move.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
I am unsure if this has to do with iframes or if I am simply misunderstanding the best approach.
Still have been unable to find a way to reliably detect the end of the page.
Thank you!
After importing the required imports
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
you can validate the page button is reached when the following element is visible:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//p[contains(text(),'License')]"))
As about the second issue, try clicking on the following element with Selenium:
driver.find_element_by_id("products-container").click
I have no environment to debug this, but I guess this will work

Selenium/Python - Finding Dynamically Created Fields

Newbie here... 2 days into learning this.
In a learning management system, there is an element (a plus mark icon) to click which adds a form field upon each click.  The goal is to click the icon, which generates a new field, and then put text into the new field.  This field does NOT exist when the page loads... it's added dynamically based on the clicking of the icon.
When I try to use "driver.find_element_by_*" (have tried ID, Name and xpath), I get an error that it can't be found. I'm assuming it's because it wasn't there when the page loaded. Any way to resolve this?
By the way, I've been successful in scripting the login process and navigating through the site to get to this point. So, I have actually learned how to find other elements that are static.
Let me know if I need to provide more info or a better description.
Thanks,
Bill
Apparently I needed to have patience and let something catch up...
I added:
import time
and then:
time.sleep(3)
after the click on the icon to add the field. It's working!
You can use time.sleep(3) but that would force you to wait for the entire 3 seconds before using that element. In Selenium we use webdriver waits that polls the DOM to allow us to immediately use that element as quick as possible when it is useable.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,""))).click()

PyCharm and/or python isn't recognizing a WebElement (Selenium) variable until I open the class in the debugger

I am getting some very odd behavior on a project. Essentially this automation creates an article on a web page using selenium and verifies it's existence by clicking/opening the article. However when i run the program without debug it is failing to "click()" on the web element. So I investigate and start to debug. I notice that the web element ("target_element") is "None". I continue past for just debugging and it fails to click "target_element" as expected.
When i rerun the program and instead of continuing on i open the class in the debug view. I see that "target_element" exists, so i continue on and target_element.click() successfully loads the article on the webpage. Does anyone have any suggestions on how to fix this problem?
TLDR; article.target_element does not exist when running the program UNLESS i set a breakpoint and look at "article" in the debugger. Then if i continue in the program article.target_element suddenly exists.
Possible reason is the element load time.
When you are running the test Selenium runs faster than the time needed by the element to load properly and eventually throws the exception. On the other hand, during debug you are stepping in and giving enough time for the element to load and Selenium finds the element without any issue.
If, the problem is exactly what it is mentioned above the use of explicit wait should be the way to go.
Example taken from the Selenium Python doc
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()

Using selenium and python, how to check that even though an element exists but it is not actually visible

I have been using the find_element_by_xpath or cssSelector to locate elements on a page.
Today, I ran into a problem where the xpath of an alert message is present in the HTML but actually not visible on the site. An example is JS will display a banner message when the users enter a page, but disappears after 3s.
The CSS Selector span.greet will always return an element in HTML, but it doesn't necessary mean that it is displaying on the page.
...
<span class="greet">Hello</span>
<span class="greetAgain">Hello Again!</span>
...
I read the documentation on is_Visible() but I'm not quite sure if I understand fully if it could be a solution? If not, is there other methods I could use instead?
I had similar problem, but in my case another element was overlaying the actual element. So I found a solution using javascript executer instead of clicking with webdriver. Waiting for an amount of time can cause random errors during the tests.
Example click OK button:
ok_btn = self.driver.find_element_by_xpath("//button[contains(.,'OK')]")
self.driver.execute_script("arguments[0].click();", ok_btn)
After loading the page via selenium, the element may be visible when you test, but hidden after some time.
A simple way would be to wait for a fixed period of time.
You can use time.sleep to pause your script if you want to wait for the element to hide.
import time
def your_test_function():
# initial tests (to check if elements are displayed)
time.sleep(3)
# delayed tests (to check if something is hidden)
If you need more control, or you want to wait for elements to be displayed / hidden you can use the Webdriver's Wait method.
http://selenium-python.readthedocs.org/en/latest/waits.html?highlight=wait
Also, you should use the is_displayed method to check if the element is visible.
http://selenium-python.readthedocs.org/en/latest/api.html?highlight=is_displayed#selenium.webdriver.remote.webelement.WebElement.is_displayed
You need to explicitly wait for visibility_of_element_located Expected Condition. Or, in other words, wait for element to become visible:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.greet")))
Resist temptation to use time.sleep() - it is very unreliable and error-prompt.

Categories