Exception prevents webdriver from responding properly

Exception prevents webdriver from responding properly - python

Getting some very strange behaviour in the following example code segment:
from selenium import webdriver
try:
driver = webdriver.Firefox(executable_path='./bin/geckodriver.exe')
variable = input("Enter something or Ctrl+C: ")
driver.get(variable)
except:
pass
finally:
driver.close()
driver.quit()
If I enter a valid URL, the webdriver fetches the page & then the browser instance is closed.
If I enter an invalid URL, a selenium.common.exceptions.InvalidArgumentException is thrown, but the code progresses & the browser is still closed
If I press Ctrl + C to send the SIGINT during the input statement:
Exception hits pass in main method & proceeds to finally
Calling just driver.quit() returns None and Firefox instance is left open
Calling driver.close() results in urllib3.exceptions.MaxRetryError: ... Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it' and the program terminates with Firefox left open
This is the simplest example I could come up with, but I'm getting the same behaviour in some code that I'm writing when WebDriverWait's are interrupted or when seemingly unrelated code throws an Exception; suddenly the webdriver instance is unresponsive. This is a problem since it leaves headless Firefox instances open. Is this a known issue when working with Selenium, or am I doing something I shouldn't be?
The Firefox version being used is Quantum v64 & Geckodriver is v0.23.0; both should be up-to-date.
Edit: Using pdb to step through the code, the driver instance is created & firefox opens, it prompts for the input and I press Ctrl+C, driver.get(variable) is not executed, the code moves to except and then to finally, and I receive a MaxRetryError out of nowhere. If I replace the input(..) line with: raise KeyboardInterrupt(), then the browser closes as expected; I'm not sure why the program has this behaviour in response to Ctrl+C.
Edit (2): Reported this as a bug on the Selenium Github Repo. Python version difference was suggested, but I retried under 3.7.2 (most recent) and still exhibiting the same behaviour-- Selenium / Firefox / Python / Geckodriver are all now as up-to-date as they can be & I'm still running into this issue-- Hopefully this gets resolved; seemingly, this is not an issue with the code I've written.

Related

Properly starting/stopping Selenium standalone server

I am using the Selenium standalone server for a remote web driver. One thing I am trying to figure out is how to start/stop it effectively. On their documentation, it says
"the caller is expected to terminate each session properly, calling either Selenium#stop() or WebDriver#quit."
What I am trying to do is figure out how to programmatically close the server, but is that even necessary? In other words, would it be okay to have the server up and running at all times, but to just close the session after each use with something like driver.quit()? Therefore when I'm not using it the server would be up but there would be no sessions.

While using Selenium standalone server as a Remote WebDriver you need to invoke quit() method at the end to terminate each session properly.
As per best practices, you should invoke the quit() method within the tearDown() {}. Invoking quit() DELETEs the current browsing session through sending "quit" command with {"flags":["eForceQuit"]} and finally sends the GET request on /shutdown EndPoint. Here is an example below :
1503397488598 webdriver::server DEBUG -> DELETE /session/8e457516-3335-4d3b-9140-53fb52aa8b74
1503397488607 geckodriver::marionette TRACE -> 37:[0,4,"quit",{"flags":["eForceQuit"]}]
1503397488821 webdriver::server DEBUG -> GET /shutdown
So on invoking quit() method the Web Browser session and the WebDriver instance gets killed completely.
References
You can find a couple of relevant detailed discussions in:
Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
PhantomJS web driver stays in memory

you were right. Use seleniums driver.quit() as it properly closes all browser windows and ends driver's session/process. Especially the latter is what you want, because you most certainly run the script headless.
I have a selenium script running on as Raspberry Pi (hourly cron job) headless. That script calls driver.quit() at the end of each iteration. When i do a -ps A (to list al active processes under unix), no active selenium/python processes are shown anymore.
Hope that satisfies your question!

Multiple Chrome instances with Selenium: "Closing stream with result -2" Error in Instance 1 when running driver.close/quit in Instance 2

I'm running a combination of BeautifulSoup and Selenium for the following scenario:
Open Webdriver (Chrome) with Selenium, waiting for commands.
Infinite loop of BeautifulSoup in the background to check website for changes every few seconds
If a certain change is detected, use Selenium to open/load website and click a few buttons to render content I want to scrape.
I am tracking changes on two different websites. I could just test the two websites in the same script, but I need to track the changes as soon as possible. Therefore, I'm also opening Selenium at the beginning of the scripts to save time before changes are detected so that I don't have to wait for Chrome to first boot up as soon as a change is detected.
I need the cookies of my Chrome default profile to access the website (Two-factor login), so I copied my User Data folder to load two sessions of Chrome. If I don't do that, I get an error in the second instance, saying that the profile is being in use, can't edit the files yada yada...
Opening one instance/list/website is fine, my script runs without any issues.
When I open a second instance of the script, checking a different website in a new window of Chrome, the first script gives me the following error whenever the second script initiates driver.close() or driver.quit().
[22384:16640:0909/170836.278:ERROR:socket_stream.cc(219)] Closing stream with result -2
This doesn't shut down my first script, it seems to keep working fine. So I'm not even thinking this has any detrimental impact on what I am trying to do, but I wonder what is happening here and why are the two instances seemingly interacting with each other when they shouldn't really?
I'm using two different copies of chromedriver.exe for each script as well, not even sure if that is necessary.
Anyway, would appreciate if someone could enlighten me about what is going on.

This error message...
ERROR:socket_stream.cc(219)] Closing stream with result -2
...implies that the ChromeDriver was an error while invoking CloseStream().
This error is coming from void SocketInputStream::CloseStream() function defined in socket_stream.cc which is defined as:
void SocketInputStream::CloseStream(net::Error error,
const base::Closure& callback) {
DCHECK_LT(error, net::ERR_IO_PENDING);
ResetInternal();
last_error_ = error;
LOG(ERROR) << "Closing stream with result " << error;
if (!callback.is_null())
callback.Run();
}

Python 3 Selenium WebDriverWait causes script to hang/freeze forever

I have a script that uses a selenium webdriver (geckodriver) and loads various webpages.
The scripts works fine at the beginning, but than at a random point it stops working without raising any error (the program sorts of hangs without really doing anything).
I added some logging statement to check when it hangs, and this is caused by the WebDriverWait statement (see below).
The last thing that is printed in the log is "get_records - Loaded".
The expected behavior to me would be to either print "get_records - Acquired pager", or to raise a TimeoutException after 10 seconds.
[...]
logging.info("get_records - Getting url: {}".format(url))
driver.get(url)
logging.info("get_records - Loaded")
# Get records number and result pages
elem = WebDriverWait(driver, 10).until(ec.element_to_be_clickable(
(By.XPATH, "//td[#align='right']/span[#class='pager']"))
)
logging.info("get_records - Acquired pager")
[...]
Python version: 3.7.3
Selenium version: 3.141.0
Firefox version: 70.0.1
It seems like a similar bug happened with previous version (Selenium WebDriver (2.25) Timeout Not Working), but that bug was closed.
Is anyone having the same issue?
Update:
It seems like adding time.sleep(0.5) before elem prevents the script from freezing (either "get_records - Acquired pager" is printed, or the timeoutException is raised).
Even though this is a turnaround for the issue, I would rather not put any forced wait.

I actually have the exactly same experience when the script works fine at first but hangs forever after some time. The '10 seconds' timeout is that webdriver/browser tries to open a page in 10 seconds. But the timeout that python script sends request to webdriver/browser is not defined. And it's none by default meaning request will wait infinitely.
Short answer:
driver.command_executor.set_timeout(10)
driver.get(url)
Explain:
Chromedriver as example. Whenever you run a selenium script. A process named 'chromedriver' starts as well. Let's call it 'control process'. It opens the browser and controls it. And it also acts as a http server which you can get the address and port by driver.command_executor._url. It receives http request, processes it, tells the browser to do something(maybe open a url) and returns. Details here.
When you call
elem = WebDriverWait(driver, 10).until(ec.element_to_be_clickable(
(By.XPATH, "//td[#align='right']/span[#class='pager']"))
)
you are actually sending a request to the 'control process' which is a http server and tell it to do something(find some elements in current page). The timeout '10' means that 'control process' tells browser to open a page in 10 seconds before it cancels and returns timeout status to the python script.
But what really happens here is the 'control process' is receiving request but not responding. I don't really know what's happening in the 'control process'.
Python selenium package is using urllib3.request to send request and socket._GLOBAL_DEFAULT_TIMEOUT as timeout. It is none by default that makes a request wait infinitely. So you can set it by using driver.command_executor.set_timeout(10). Now if 'control process' brokes you will get a timeout exception and maybe recreate webdriver.

How can I tell ChromeDriver to wait longer for Chrome to launch before giving up?

Background
I'm using Selenium and Python to automate display and navigation of a website in Chromium on Ubuntu MATE 16.04 on a Raspberry Pi 3. (Think unattended digital signage.) This combination was working great until today when the newest version of Chromium (with matching ChromeDriver) installed via automatic updates.
Because Chromium needed to perform some upgrade housekeeping tasks the next time it started up, it took a little longer than usual. Keep in mind that this is on a Raspberry Pi, so I/O is severely bottlenecked by the SD card. Unfortunately, it took long enough that my Python script failed because the ChromeDriver gave up on Chromium ever starting:
Traceback (most recent call last):
File "call-tracker-start", line 15, in <module>
browser = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: chrome not reachable
(Driver info: chromedriver=2.35 (0),platform=Linux 4.4.38-v7+ armv7l)
Of course, when the script dies after throwing this exception, the Chromium instance is killed before it can finish its housekeeping, which means that next time it has to start over, so it takes just as long as the last time and fails just as hard.
If I then manually intervene and run Chromium as a normal user, I just... wait... a minute... or two, for Chromium to finish its upgrade housekeeping, then it opens its browser window, and then I cleanly quit the application. Now that the housekeeping is done, Chromium starts up the next time at a more normal speed, so all of the sudden my Python script runs without any error because the ChromeDriver sees Chromium finish launching within its accepted timeout window.
Everything will likely be fine until the next automatic update comes down, and then this same problem will happen all over again. I don't want to have to manually intervene after every update, nor do I want to disable automatic updates.
The root of the question
How can I tell ChromeDriver not to give up so quickly on launching Chromium?
I looked for some sort of timeout value that I could set, but I couldn't find any in the ChromeDriver or Selenium for Python documentation.
Interestingly, there is a timeout argument that can be passed to the Firefox WebDriver, as shown in the Selenium for Python API documentation:
timeout – Time to wait for Firefox to launch when using the extension connection.
This parameter is also listed for the Internet Explorer WebDriver, but it's notably absent in the Chrome WebDriver API documentation.
I also wouldn't mind passing something directly to ChromeDriver via service_args, but I couldn't find any relevant options in the ChromeDriver docs.
Update: found root cause of post-upgrade slowness
After struggling with finding a way to reproduce this problem in order to test solutions, I was able to pinpoint the reason Chromium takes forever to launch after an upgrade.
It seems that, as part of its post-upgrade housekeeping, Chromium rebuilds the user's font cache. This is a CPU & I/O intensive process that is especially hard on a Raspberry Pi and its SD card, hence the extreme 2.5 minute launch time whenever the font cache has to be rebuilt.
The problem can be reproduced by purposely deleting the font cache, which forces a rebuild:
pi#rpi-dev1:~$ killall chromium-browser
pi#rpi-dev1:~$ time chromium-browser --headless --disable-gpu --dump-dom 'about:blank'
[0405/132706.970822:ERROR:gpu_process_transport_factory.cc(1019)] Lost UI shared context.
<html><head></head><body></body></html>
real 0m0.708s
user 0m0.340s
sys 0m0.200s
pi#rpi-dev1:~$ rm -Rf ~/.cache/fontconfig
pi#rpi-dev1:~$ time chromium-browser --headless --disable-gpu --dump-dom 'about:blank'
[0405/132720.917590:ERROR:gpu_process_transport_factory.cc(1019)] Lost UI shared context.
<html><head></head><body></body></html>
real 2m9.449s
user 2m8.670s
sys 0m0.590s

You are right, there is no option to explicitly set the timeout of the initial driver creation. I would recommend visiting their git page HERE and creating a new issue. It also has the links for the direct ChromeDriver site in case you want to create a bug there. Currently, there is no option to set timeout that I could find.
You could try something like this in the meantime though:
import webbrowser
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
try:
driver = webdriver.Chrome()
except WebDriverException:
webbrowser.open_new('http://www.Google.com')
# Let this try and get Chrome open, then go back and use webdriver
Here is the documentation on webbrowser:
https://docs.python.org/3/library/webbrowser.html

As per your question without your code trial it would be tough to analyze the reason behind the error which you are seeing as :
selenium.common.exceptions.WebDriverException: Message: chrome not reachable
Perhaps a more details about the version info of the binaries you are using would have helped us in someway.
Factually, asking ChromeDriver to wait longer for Chrome to launch before giving up won't help us as the default configuration of ChromeDriver takes care of the optimum needs.
However WebDriverException: Message: chrome not reachable is pretty common issue when the binary versions are incompatible. You can find a detailed discussion about this issue at org.openqa.selenium.WebDriverException: chrome not reachable - when attempting to start a new session

The bad news
It turns out that not only is there no timeout option for Selenium to pass to ChromeDriver, but short of recompiling your own custom ChromeDriver, there is currently no way to change this value programmatically whatsoever. Sadly, looking at the source code shows that Google has hard-coded a timeout value of 60 seconds!
from chromium /src/chrome/test/chromedriver/chrome_launcher.cc#208:
std::unique_ptr<DevToolsHttpClient> client(new DevToolsHttpClient(
address, context_getter, socket_factory, std::move(device_metrics),
std::move(window_types), capabilities->page_load_strategy));
base::TimeTicks deadline =
base::TimeTicks::Now() + base::TimeDelta::FromSeconds(60);
Status status = client->Init(deadline - base::TimeTicks::Now());
Until this code is changed to allow custom deadlines, the only option is a workaround.
The workaround
I ended up taking an approach that "primed" Chromium before having Selenium call ChromeDriver. This gets that one-time, post-upgrade slow start out of the way before ChromeDriver ever begins its countdown. The answer #PixelEinstein gave helped lead me down the right path, but this solution differs in two ways:
The call to open standalone Chromium here is blocking, while webbrowser.open_new() is not.
Standalone Chromium is always launched before ChromeDriver whether it is needed or not. I did this because waiting one minute for ChromeDriver to timeout, then waiting another 2.5 minutes for Chromium to start, then trying ChromeDriver again created a total delay of just over 3.5 minutes. Launching Chromium as the first action brings the total wait time down to about 2.5 minutes, as you skip the initial ChromeDriver timeout. On occasions when the long startup time doesn't occur, then this "double loading" of Chromium is negligible, as the whole process finishes in a matter of seconds.
Here's the code snippet:
#!/usr/bin/env python3
import subprocess
from selenium import webdriver
some_site = 'http://www.google.com'
chromedriver_path = '/usr/lib/chromium-browser/chromedriver'
# Block until Chromium finishes launching and self-terminates
subprocess.run(['chromium-browser', '--headless', '--disable-gpu', '--dump-dom', 'about:blank'])
browser = webdriver.Chrome(executable_path=chromedriver_path)
browser.get(some_site)
# Continue on with your Selenium business...
Before instantiating a webdriver.Chrome() object, this waits for Chromium to finish its post-upgrade housekeeping no matter how long it takes. Chromium is launched in headless mode where --dump-dom is a one-shot operation that writes the requested web page (in this case about:blank) to stdout, which is ignored. Chromium self-terminates after completing the operation, which then returns from the subprocess.run() call, unblocking program flow. After that, it's safe to let ChromeDriver start its countdown, as Chromium will launch in a matter of seconds.

Python3, Selenium, Chromedriver console window

I've a made a selenium test using python3 and selenium library.
I've also used Tkinter to make a GUI to put some input on (account, password..).
I've managed to hide the console window for python by saving to the .pyw extension; and when I make an executable with my code, the console doesn't show up even if it's saved with .py extension.
However, everytime the chromedriver starts, it also starts a console window, and when the driver exists, this window does not.
so in a loop, i'm left with many webdriver consoles.
Is there a work around this to prevent the driver from launching a console everytime it runs ?

I hated dealing with this in selenium until I remembered that this was an obvious use case for context managers just like the usage of open.
I did find out that selenium is about to add this officially to their package in this pull request
Until this is officially added, this snippet should give you the functionality you need to get things going :)
import contextlib
#contextlib.contextmanager
def Chrome(*args, **kwargs):
webdriver = webdriver.Chrome(*args, **kwargs)
try:
yield webdriver
finally:
webdriver.quit()
with Chrome() as driver:
# whatever you're planning on doing goes here

driver.close() and driver.quit() are two different methods for closing the browser session in Selenium WebDriver.
driver.close() - It closes the the browser window on which the focus is set.
driver.quit() – It basically calls driver.dispose method which in turn closes all the browser windows and ends the WebDriver session gracefully.
You should use driver.quit whenever you want to end the program. It will close all opened browser window and terminates the WebDriver session. If you do not use driver.quit at the end of program, WebDriver session will not close properly and files would not be cleared off memory. This may result in memory leak errors.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.