Working with Selenium and multiple Chrome Browsers on Linux - python

I have developed a python application for Selenium using Chrome/ChromeDriver.
The application seems to work pretty well on my Windows Based Laptop, but when I move everything to my Linux Based Server Machine, I started to notice strange behaviors when running multiple Browser instances in parallel.
The approach I am currently using is pretty simple: I have one Daemon, launching a separated process instance with Popen for each sub process, as follows:
sub_call = subprocess.Popen(["python", "my_script.py"], stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE, text=True, universal_newlines=True)
Ideally each sub process should run in parallel in a separated Chrome session, with its own Browser parameters (cookies, proxies, etc...).
No sync mechanisms are implemented among any sub process.
For now I would prefer to avoid Selenium Grid because I only plan to run few instances in parallel (let's say less than 5), and I have hardware constraints being everything running in a Docker on a NAS.
This is the code I use to instantiate every ChromeDriver object:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome_options.add_argument('--user-agent={}'.format(constants.BROWSER_AGENT_HEADER))
chrome_options.add_argument("--headless")
#chrome_options.add_argument("--remote-debugging-port={}".format(CHROME_DRIVER_DEBUG_PORT))
chrome_options.add_argument("--window-size={}".format(CHROME_DRIVER_RESOLUTION))
if ON_POSIX:
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
# chrome_options.add_argument('--remote-debugging-address={}'.format(CHROME_DRIVER_DOCKER_BIND))
chrome_options.add_argument('--proxy-server=http://{}:{}'.format(self.proxy_ip, self.proxy_port))
cap = webdriver.DesiredCapabilities.CHROME
cap['proxy'] = {
"httpProxy": proxy_address_str,
"httpsProxy": proxy_address_str,
"ftpProxy": proxy_address_str,
"sslProxy": proxy_address_str,
"proxyType": "MANUAL",
}
cap['goog:loggingPrefs'] = {'performance': 'ALL'}
service_args = ["--log-path={}/chromedriver.log".format(LOGFILE_PATH),
"--whitelisted-ips=", "--profile-directory={}".format(username)]
self.driver = webdriver.Chrome(executable_path=constants.CHROME_DRIVER_PATH,
service_args=service_args, desired_capabilities=cap, options=chrome_options)
Where I tried to separate each user environment with the --profile-directory arg (NOTE: I just put my system user_names, without setting up any Chrome profile actually)
However it seems that while under windows each process opens its own Chrome window, on Linux there are conficts.
The major issue I had is that when a process ends, if it executes just one of the above lines, effects the other's processes browser in some ways:
driver.quit()
driver.close()
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')
I have temporary commented each of the above lines. It seems to work perfectly (10 hours running at normal duty, no issues)
Now my concern is that I want to avoid that all the open instances of Chrome will end up to get all my memory.
How can I clean the browser without effecting my other running processes?
Thank you in advance for any support I will receive.

Related

selenium: finding it difficult to open a new session in an existing browser window via geckodriver

I'm trying to develop scripts that are memory efficient. I believe using a single instance(window) of firefox would speed up pace execution of scripts and decrease space complexity and therefore increase efficiency of selenium scripts in python (by enabling use of more scripts simultaneously). It would also make browser sessions more organized and easy to use.
I have tried using --connect-existing argument to the options variable
options.add_argument("--connect-existing")
driver = webdriver.Firefox(options=options)
but still saw a new window open.
Tried these solutions: https://stackoverflow.com/a/37964479
https://stackoverflow.com/a/73964036/20356446
But they too fail run

Close all webdriver sessions using Python

I am running a Flask server and I have the following problem.
When an user login in, a Selenium webdriver is initialized and performs some task. It store some cookies and then it communicates with frontend (I can't control WHEN it will save the cookies and I cannot close it with driver.close(). After this I need to start again the chromedriver but preserving the cookies (I'm using User dir for this reason and this works).
The problem is that the second time I start the webdriver I get error because the previous one is not closed. How can I close it before starting a new one, using Python?
Thanks!
I expect to close all the active chromedriver sessions without using Selenium (I cannot use it), but using Python.
Ideally, you want to driver.quit() to gracefully close all browser windows (note that driver.close() only closes the window in focus). If that's not possible, you can use psutil to kill the chromedriver processes:
for proc in psutil.process_iter():
if 'chromedriver' in proc.name():
proc.kill()
A similar approach can work with the built-in subprocess, but maybe with a few extra steps depending on your OS.
You can save the cookies ina txt file, and every time you run the drive, you add the cookies with 'driver.get_cookies()', 'drive.add_cookie()' and these structure:
content_cookies = {
'name': <NAME_VARIABLE_COOKIE>,
'domain': '<URLSITE>',
'value': str(<VALUE_VARIABLE_COOKIE>)
}
driver.add_cookie(content_cookies)

Change proxy settings while chrome is initiliazed

I have a bunch of questions, please answer them or any known
Question 1:
How to change proxy settings after chrome has been launched?
Reason:
Infact, what I want is, I can't wait for chrome to relaunch to change proxy setting everytime, I want some code to be injected in inspect or use any selenium driver method for changing proxy
this is possible in normal chrome using extensions but chromedriver can't click on extension also it is not the most convenient way as my vpn does not have chrome extension
Question 2:
Is it possible to add a list of proxies to auto rotate for every website open?
Question 3:
Is it possible to run each chrome tab with seperate ip/proxy?
Question 4:
Can you tell me what matters the most running chrome flawlessly, multi-instances
i.e, I want to run multiple tests at once maybe 100s, my RAM is 16gb but computer still become slow due to more instances, CPU i5 8th gen
(1) Use like this
from selenium import webdriver
PROXY = "11.456.448.110:8080"
chrome_options = WebDriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % PROXY)
chrome = webdriver.Chrome(chrome_options=chrome_options)
chrome.get("https://www.google.com")
(2)
Is it possible to add a list of proxies to auto rotate for every website open?
Yes and No. You must initialize new WebDriver instance. Yes, create new WebDriver instance, then open your web-page. No. You cannot change proxy after a WebDriver instance opened.
(3)
Is it possible to run each chrome tab with seperate ip/proxy?
No.
(4)
Can you tell me what matters the most running chrome flawlessly, multi-instances i.e, I want to run multiple tests at once maybe 100s, my RAM is 16gb but computer still become slow due to more instances, CPU i5 8th gen
Init many WebDriver, then put inside Python threads.
See
https://www.browserstack.com/guide/set-proxy-in-selenium
https://www.guru99.com/sessions-parallel-run-and-dependency-in-selenium.html

How to clean up all Selenium Firefox Processes

I've created a web scraper with python (3.6) and a selenium, firefox web driver. I've set up a cronjob to run this scraper every few minutes, and it seems to all be working, except that over time (like a few days), the memory on my Ubuntu VPS (8GB RAM, Ubuntu 18.04.4) fills up and it crashes.
When I check HTOP, I can see lots (as in, hundreds) of firefox processes like "/usr/lib/firefox -marionette" and "/usr/lib/firefox -contentproc", all taking up about 3 or 4mb of memory each.
I've put a
browser.stop_client()
browser.close()
browser.quit()
In every function that uses the web driver, but I suspect the script is sometimes leaving web drivers open when it hits an error, and not closing them properly, and these firefox processes just accumulate until my system crashes.
I'm working on finding the root cause of this, but in the meantime, is there a quick way I can kill/clean up all these processes?
e.g. a cronjob that kills all matching processes (older than 10 minutes)?
Thanks.
I suspect the script is sometimes leaving web drivers open when it hits an error, and not closing them properly
This is most likely the issue. We fix this issue by using try except finally blocks.
browser = webdriver.Firefox()
try:
# Your code
except Exception as e:
# Log or print error
finally:
browser.close()
browser.quit()
And if you still face the same issue, you can force kill the driver as per this answer, or this answer for Ubuntu.
import os
os.system("taskkill /im geckodriver.exe /f")

How can I tell ChromeDriver to wait longer for Chrome to launch before giving up?

Background
I'm using Selenium and Python to automate display and navigation of a website in Chromium on Ubuntu MATE 16.04 on a Raspberry Pi 3. (Think unattended digital signage.) This combination was working great until today when the newest version of Chromium (with matching ChromeDriver) installed via automatic updates.
Because Chromium needed to perform some upgrade housekeeping tasks the next time it started up, it took a little longer than usual. Keep in mind that this is on a Raspberry Pi, so I/O is severely bottlenecked by the SD card. Unfortunately, it took long enough that my Python script failed because the ChromeDriver gave up on Chromium ever starting:
Traceback (most recent call last):
File "call-tracker-start", line 15, in <module>
browser = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/home/pi/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: chrome not reachable
(Driver info: chromedriver=2.35 (0),platform=Linux 4.4.38-v7+ armv7l)
Of course, when the script dies after throwing this exception, the Chromium instance is killed before it can finish its housekeeping, which means that next time it has to start over, so it takes just as long as the last time and fails just as hard.
If I then manually intervene and run Chromium as a normal user, I just... wait... a minute... or two, for Chromium to finish its upgrade housekeeping, then it opens its browser window, and then I cleanly quit the application. Now that the housekeeping is done, Chromium starts up the next time at a more normal speed, so all of the sudden my Python script runs without any error because the ChromeDriver sees Chromium finish launching within its accepted timeout window.
Everything will likely be fine until the next automatic update comes down, and then this same problem will happen all over again. I don't want to have to manually intervene after every update, nor do I want to disable automatic updates.
The root of the question
How can I tell ChromeDriver not to give up so quickly on launching Chromium?
I looked for some sort of timeout value that I could set, but I couldn't find any in the ChromeDriver or Selenium for Python documentation.
Interestingly, there is a timeout argument that can be passed to the Firefox WebDriver, as shown in the Selenium for Python API documentation:
timeout – Time to wait for Firefox to launch when using the extension connection.
This parameter is also listed for the Internet Explorer WebDriver, but it's notably absent in the Chrome WebDriver API documentation.
I also wouldn't mind passing something directly to ChromeDriver via service_args, but I couldn't find any relevant options in the ChromeDriver docs.
Update: found root cause of post-upgrade slowness
After struggling with finding a way to reproduce this problem in order to test solutions, I was able to pinpoint the reason Chromium takes forever to launch after an upgrade.
It seems that, as part of its post-upgrade housekeeping, Chromium rebuilds the user's font cache. This is a CPU & I/O intensive process that is especially hard on a Raspberry Pi and its SD card, hence the extreme 2.5 minute launch time whenever the font cache has to be rebuilt.
The problem can be reproduced by purposely deleting the font cache, which forces a rebuild:
pi#rpi-dev1:~$ killall chromium-browser
pi#rpi-dev1:~$ time chromium-browser --headless --disable-gpu --dump-dom 'about:blank'
[0405/132706.970822:ERROR:gpu_process_transport_factory.cc(1019)] Lost UI shared context.
<html><head></head><body></body></html>
real 0m0.708s
user 0m0.340s
sys 0m0.200s
pi#rpi-dev1:~$ rm -Rf ~/.cache/fontconfig
pi#rpi-dev1:~$ time chromium-browser --headless --disable-gpu --dump-dom 'about:blank'
[0405/132720.917590:ERROR:gpu_process_transport_factory.cc(1019)] Lost UI shared context.
<html><head></head><body></body></html>
real 2m9.449s
user 2m8.670s
sys 0m0.590s
You are right, there is no option to explicitly set the timeout of the initial driver creation. I would recommend visiting their git page HERE and creating a new issue. It also has the links for the direct ChromeDriver site in case you want to create a bug there. Currently, there is no option to set timeout that I could find.
You could try something like this in the meantime though:
import webbrowser
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
try:
driver = webdriver.Chrome()
except WebDriverException:
webbrowser.open_new('http://www.Google.com')
# Let this try and get Chrome open, then go back and use webdriver
Here is the documentation on webbrowser:
https://docs.python.org/3/library/webbrowser.html
As per your question without your code trial it would be tough to analyze the reason behind the error which you are seeing as :
selenium.common.exceptions.WebDriverException: Message: chrome not reachable
Perhaps a more details about the version info of the binaries you are using would have helped us in someway.
Factually, asking ChromeDriver to wait longer for Chrome to launch before giving up won't help us as the default configuration of ChromeDriver takes care of the optimum needs.
However WebDriverException: Message: chrome not reachable is pretty common issue when the binary versions are incompatible. You can find a detailed discussion about this issue at org.openqa.selenium.WebDriverException: chrome not reachable - when attempting to start a new session
The bad news
It turns out that not only is there no timeout option for Selenium to pass to ChromeDriver, but short of recompiling your own custom ChromeDriver, there is currently no way to change this value programmatically whatsoever. Sadly, looking at the source code shows that Google has hard-coded a timeout value of 60 seconds!
from chromium /src/chrome/test/chromedriver/chrome_launcher.cc#208:
std::unique_ptr<DevToolsHttpClient> client(new DevToolsHttpClient(
address, context_getter, socket_factory, std::move(device_metrics),
std::move(window_types), capabilities->page_load_strategy));
base::TimeTicks deadline =
base::TimeTicks::Now() + base::TimeDelta::FromSeconds(60);
Status status = client->Init(deadline - base::TimeTicks::Now());
Until this code is changed to allow custom deadlines, the only option is a workaround.
The workaround
I ended up taking an approach that "primed" Chromium before having Selenium call ChromeDriver. This gets that one-time, post-upgrade slow start out of the way before ChromeDriver ever begins its countdown. The answer #PixelEinstein gave helped lead me down the right path, but this solution differs in two ways:
The call to open standalone Chromium here is blocking, while webbrowser.open_new() is not.
Standalone Chromium is always launched before ChromeDriver whether it is needed or not. I did this because waiting one minute for ChromeDriver to timeout, then waiting another 2.5 minutes for Chromium to start, then trying ChromeDriver again created a total delay of just over 3.5 minutes. Launching Chromium as the first action brings the total wait time down to about 2.5 minutes, as you skip the initial ChromeDriver timeout. On occasions when the long startup time doesn't occur, then this "double loading" of Chromium is negligible, as the whole process finishes in a matter of seconds.
Here's the code snippet:
#!/usr/bin/env python3
import subprocess
from selenium import webdriver
some_site = 'http://www.google.com'
chromedriver_path = '/usr/lib/chromium-browser/chromedriver'
# Block until Chromium finishes launching and self-terminates
subprocess.run(['chromium-browser', '--headless', '--disable-gpu', '--dump-dom', 'about:blank'])
browser = webdriver.Chrome(executable_path=chromedriver_path)
browser.get(some_site)
# Continue on with your Selenium business...
Before instantiating a webdriver.Chrome() object, this waits for Chromium to finish its post-upgrade housekeeping no matter how long it takes. Chromium is launched in headless mode where --dump-dom is a one-shot operation that writes the requested web page (in this case about:blank) to stdout, which is ignored. Chromium self-terminates after completing the operation, which then returns from the subprocess.run() call, unblocking program flow. After that, it's safe to let ChromeDriver start its countdown, as Chromium will launch in a matter of seconds.

Categories