Python 3 Selenium WebDriverWait causes script to hang/freeze forever - python

I have a script that uses a selenium webdriver (geckodriver) and loads various webpages.
The scripts works fine at the beginning, but than at a random point it stops working without raising any error (the program sorts of hangs without really doing anything).
I added some logging statement to check when it hangs, and this is caused by the WebDriverWait statement (see below).
The last thing that is printed in the log is "get_records - Loaded".
The expected behavior to me would be to either print "get_records - Acquired pager", or to raise a TimeoutException after 10 seconds.
[...]
logging.info("get_records - Getting url: {}".format(url))
driver.get(url)
logging.info("get_records - Loaded")
# Get records number and result pages
elem = WebDriverWait(driver, 10).until(ec.element_to_be_clickable(
(By.XPATH, "//td[#align='right']/span[#class='pager']"))
)
logging.info("get_records - Acquired pager")
[...]
Python version: 3.7.3
Selenium version: 3.141.0
Firefox version: 70.0.1
It seems like a similar bug happened with previous version (Selenium WebDriver (2.25) Timeout Not Working), but that bug was closed.
Is anyone having the same issue?
Update:
It seems like adding time.sleep(0.5) before elem prevents the script from freezing (either "get_records - Acquired pager" is printed, or the timeoutException is raised).
Even though this is a turnaround for the issue, I would rather not put any forced wait.

I actually have the exactly same experience when the script works fine at first but hangs forever after some time. The '10 seconds' timeout is that webdriver/browser tries to open a page in 10 seconds. But the timeout that python script sends request to webdriver/browser is not defined. And it's none by default meaning request will wait infinitely.
Short answer:
driver.command_executor.set_timeout(10)
driver.get(url)
Explain:
Chromedriver as example. Whenever you run a selenium script. A process named 'chromedriver' starts as well. Let's call it 'control process'. It opens the browser and controls it. And it also acts as a http server which you can get the address and port by driver.command_executor._url. It receives http request, processes it, tells the browser to do something(maybe open a url) and returns. Details here.
When you call
elem = WebDriverWait(driver, 10).until(ec.element_to_be_clickable(
(By.XPATH, "//td[#align='right']/span[#class='pager']"))
)
you are actually sending a request to the 'control process' which is a http server and tell it to do something(find some elements in current page). The timeout '10' means that 'control process' tells browser to open a page in 10 seconds before it cancels and returns timeout status to the python script.
But what really happens here is the 'control process' is receiving request but not responding. I don't really know what's happening in the 'control process'.
Python selenium package is using urllib3.request to send request and socket._GLOBAL_DEFAULT_TIMEOUT as timeout. It is none by default that makes a request wait infinitely. So you can set it by using driver.command_executor.set_timeout(10). Now if 'control process' brokes you will get a timeout exception and maybe recreate webdriver.

Related

Properly starting/stopping Selenium standalone server

I am using the Selenium standalone server for a remote web driver. One thing I am trying to figure out is how to start/stop it effectively. On their documentation, it says
"the caller is expected to terminate each session properly, calling either Selenium#stop() or WebDriver#quit."
What I am trying to do is figure out how to programmatically close the server, but is that even necessary? In other words, would it be okay to have the server up and running at all times, but to just close the session after each use with something like driver.quit()? Therefore when I'm not using it the server would be up but there would be no sessions.
While using Selenium standalone server as a Remote WebDriver you need to invoke quit() method at the end to terminate each session properly.
As per best practices, you should invoke the quit() method within the tearDown() {}. Invoking quit() DELETEs the current browsing session through sending "quit" command with {"flags":["eForceQuit"]} and finally sends the GET request on /shutdown EndPoint. Here is an example below :
1503397488598 webdriver::server DEBUG -> DELETE /session/8e457516-3335-4d3b-9140-53fb52aa8b74
1503397488607 geckodriver::marionette TRACE -> 37:[0,4,"quit",{"flags":["eForceQuit"]}]
1503397488821 webdriver::server DEBUG -> GET /shutdown
So on invoking quit() method the Web Browser session and the WebDriver instance gets killed completely.
References
You can find a couple of relevant detailed discussions in:
Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
PhantomJS web driver stays in memory
you were right. Use seleniums driver.quit() as it properly closes all browser windows and ends driver's session/process. Especially the latter is what you want, because you most certainly run the script headless.
I have a selenium script running on as Raspberry Pi (hourly cron job) headless. That script calls driver.quit() at the end of each iteration. When i do a -ps A (to list al active processes under unix), no active selenium/python processes are shown anymore.
Hope that satisfies your question!

Python selenium - Multiprocessing - How to close browser for an allocated process?

I know you can use driver.quit() and driver.close() but how to close nicely when executed multiple processes?
IDEAS:
Use process ID?
Using Selenium to end the WebDriver and Web Browser session graciously you should invoke the quit() method within the tearDown() {}. Invoking quit() deletes the current browsing session through sending quit command with {"flags":["eForceQuit"]} and finally sends the GET request on /shutdown endpoint. Here is the relevant log:
1503397488598 webdriver::server DEBUG -> DELETE /session/8e457516-3335-4d3b-9140-53fb52aa8b74
1503397488607 geckodriver::marionette TRACE -> 37:[0,4,"quit",{"flags":["eForceQuit"]}]
1503397488821 webdriver::server DEBUG -> GET /shutdown
So on invoking quit() method the Web Browser session and the WebDriver instance gets killed completely.
References
You can find a couple of relevant detailed discussions in:
Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
PhantomJS web driver stays in memory

Exception prevents webdriver from responding properly

Getting some very strange behaviour in the following example code segment:
from selenium import webdriver
try:
driver = webdriver.Firefox(executable_path='./bin/geckodriver.exe')
variable = input("Enter something or Ctrl+C: ")
driver.get(variable)
except:
pass
finally:
driver.close()
driver.quit()
If I enter a valid URL, the webdriver fetches the page & then the browser instance is closed.
If I enter an invalid URL, a selenium.common.exceptions.InvalidArgumentException is thrown, but the code progresses & the browser is still closed
If I press Ctrl + C to send the SIGINT during the input statement:
Exception hits pass in main method & proceeds to finally
Calling just driver.quit() returns None and Firefox instance is left open
Calling driver.close() results in urllib3.exceptions.MaxRetryError: ... Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it' and the program terminates with Firefox left open
This is the simplest example I could come up with, but I'm getting the same behaviour in some code that I'm writing when WebDriverWait's are interrupted or when seemingly unrelated code throws an Exception; suddenly the webdriver instance is unresponsive. This is a problem since it leaves headless Firefox instances open. Is this a known issue when working with Selenium, or am I doing something I shouldn't be?
The Firefox version being used is Quantum v64 & Geckodriver is v0.23.0; both should be up-to-date.
Edit: Using pdb to step through the code, the driver instance is created & firefox opens, it prompts for the input and I press Ctrl+C, driver.get(variable) is not executed, the code moves to except and then to finally, and I receive a MaxRetryError out of nowhere. If I replace the input(..) line with: raise KeyboardInterrupt(), then the browser closes as expected; I'm not sure why the program has this behaviour in response to Ctrl+C.
Edit (2): Reported this as a bug on the Selenium Github Repo. Python version difference was suggested, but I retried under 3.7.2 (most recent) and still exhibiting the same behaviour-- Selenium / Firefox / Python / Geckodriver are all now as up-to-date as they can be & I'm still running into this issue-- Hopefully this gets resolved; seemingly, this is not an issue with the code I've written.

Python Selenium send request and avoid "Waiting for (website) ...."

I am launching several requests on different tabs. While one tab loads I will iteratively go to other tabs and see whether they have loaded correctly. The mechanism works great except for one thing: a lot of time is wasted "Waiting for (website)..."
The way in which I go from one tab to the other is launching an exception whenever a key element that I have to find is missing. But, in order to check for this exception (and therefore to proceed on other tabs, as it should do) what happens is that I have to wait for the request to end (so for the message "Waiting for..." to disappear).
Would it be possible not to wait? That is, would it be possible to launch the request via browser.get(..) and then immediately change tab?
Yes you can do that. You need to change the pageLoadStrategy of the driver. Below is an example of firefox
import time
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium import webdriver
cap = DesiredCapabilities.FIREFOX
cap["pageLoadStrategy"] = "none"
print(DesiredCapabilities.FIREFOX)
driver = webdriver.Firefox(capabilities=cap)
driver.get("http://tarunlalwani.com")
#execute code for tab 2
#execute code for tab 3
Nothing will wait now and it is up to you to do all the waiting. You can also use eager instead of none

Make Selenium wait 10 seconds

Yes I know the question has been asked quite often but I still don't get it. I want to make Selenium wait, no matter what. I tried these methods
driver.set_page_load_timeout(30)
driver.implicitly_wait(90)
WebDriverWait(driver, 10)
driver.set_script_timeout(30)
and other things but it does not work. I need selenium to wait 10 seconds. NO not until some element is loaded or whatever, just wait 10 seconds. I know there is this
try:
element_present = EC.presence_of_element_located((By.ID, 'whatever'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
I do not want that.
If waiting for some seconds is to much (not achievable) for selenium, what other (python) library's/programs would be capable to achieve this task? With Javas Selenium it does not seem to be a problem...
All the APIs you have mentioned is basically a timeout, so it's gonna wait until either some event happens or maximum time reached.
set_page_load_timeout - Sets the amount of time to wait for a page load to complete before throwing an error. If the timeout is negative, page loads can be indefinite.
implicitly_wait - Specifies the amount of time the driver should wait when searching for an element if it is not immediately present.
set_script_timeout - Sets the amount of time to wait for an asynchronous script to finish execution before throwing an error. If the timeout is negative, then the script will be allowed to run indefinitely.
For more information please visit this page. (documention is for JAVA binding, but functionality should be same for all the bindings)
So, if you want to wait selenium (or any script) 10 seconds, or whatever time. Then the best thing is to put that thread to sleep.
In python it would be
import time
time.sleep(10)
The simplest way to do this in Java is using
try {
Thread.sleep(10*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}

Categories