Is there any way to avoid detection of selenium? [duplicate]

Is there any way to avoid detection of selenium? [duplicate] - python

I've been testing out Selenium with Chromedriver and I noticed that some pages can detect that you're using Selenium even though there's no automation at all. Even when I'm just browsing manually just using Chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I've checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal Chrome browser.
When I browse to these sites in normal Chrome everything works fine, but the moment I use Selenium I'm detected.
In theory, chromedriver and Chrome should look literally exactly the same to any web server, but somehow they can detect it.
If you want some test code try out this:
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')
If you browse around stubhub you'll get redirected and 'blocked' within one or two requests. I've been investigating this and I can't figure out how they can tell that a user is using Selenium.
How do they do it?
I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal Firefox browser with only the additional plugin.
When I use Fiddler to view the HTTP requests being sent back and forth I've noticed that the 'fake browser's' requests often have 'no-cache' in the response header.
Results like this Is there a way to detect that I'm in a Selenium Webdriver page from JavaScript? suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.
The site uploads a fingerprint to their servers, but I checked and the fingerprint of Selenium is identical to the fingerprint when using Chrome.
This is one of the fingerprint payloads that they send to their servers:
{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-
US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":
{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionMo
dule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":
{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-
flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContent
DecryptionModuleapplication/x-ppapi-widevine-
cdm","4":"NativeClientExecutableapplication/x-
nacl","5":"PortableNativeClientExecutableapplication/x-
pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-
pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":
{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"Trebu
chetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationM
ono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}
It's identical in Selenium and in Chrome.
VPNs work for a single use, but they get detected after I load the first page. Clearly some JavaScript code is being run to detect Selenium.

Basically, the way the Selenium detection works, is that they test for predefined JavaScript variables which appear when running with Selenium. The bot detection scripts usually look anything containing word "selenium" / "webdriver" in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.
For me, I used Chrome, so, all that I had to do was to ensure that $cdc_ didn't exist anymore as a document variable, and voilà (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)
This is the function I modified in chromedriver:
File call_function.js:
function getPageCache(opt_doc) {
var doc = opt_doc || document;
//var key = '$cdc_asdjflasutopfhvcZLmcfl_';
var key = 'randomblabla_';
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
(Note the comment. All I did I turned $cdc_ to randomblabla_.)
Here is pseudocode which demonstrates some of the techniques that bot networks might use:
runBotDetection = function () {
var documentDetectionKeys = [
"__webdriver_evaluate",
"__selenium_evaluate",
"__webdriver_script_function",
"__webdriver_script_func",
"__webdriver_script_fn",
"__fxdriver_evaluate",
"__driver_unwrapped",
"__webdriver_unwrapped",
"__driver_evaluate",
"__selenium_unwrapped",
"__fxdriver_unwrapped",
];
var windowDetectionKeys = [
"_phantom",
"__nightmare",
"_selenium",
"callPhantom",
"callSelenium",
"_Selenium_IDE_Recorder",
];
for (const windowDetectionKey in windowDetectionKeys) {
const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
if (window[windowDetectionKeyValue]) {
return true;
}
};
for (const documentDetectionKey in documentDetectionKeys) {
const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
if (window['document'][documentDetectionKeyValue]) {
return true;
}
};
for (const documentKey in window['document']) {
if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
return true;
}
}
if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;
if (window['document']['documentElement']['getAttribute']('selenium')) return true;
if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
if (window['document']['documentElement']['getAttribute']('driver')) return true;
return false;
};
According to user szx, it is also possible to simply open chromedriver.exe in a hex editor, and just do the replacement manually, without actually doing any compiling.

Replacing cdc_ string
You can use Vim or Perl to replace the cdc_ string in chromedriver. See the answer by #Erti-Chris Eelmaa to learn more about that string and how it's a detection point.
Using Vim or Perl prevents you from having to recompile source code or use a hex editor.
Make sure to make a copy of the original chromedriver before attempting to edit it.
Our goal is to alter the cdc_ string, which looks something like $cdc_lasutopfhvcZLmcfl.
The methods below were tested on chromedriver version 2.41.578706.
Using Vim
vim /path/to/chromedriver
After running the line above, you'll probably see a bunch of gibberish. Do the following:
Replace all instances of cdc_ with dog_ by typing :%s/cdc_/dog_/g.
dog_ is just an example. You can choose anything as long as it has the same amount of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.
To save the changes and quit, type :wq! and press return.
If you need to quit without saving changes, type :q! and press return.
Using Perl
The line below replaces all cdc_ occurrences with dog_. Credit to Vic Seedoubleyew:
perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver
Make sure that the replacement string (e.g., dog_) has the same number of characters as the search string (e.g., cdc_), otherwise the chromedriver will fail.
Wrapping Up
To verify that all occurrences of cdc_ were replaced:
grep "cdc_" /path/to/chromedriver
If no output was returned, the replacement was successful.
Go to the altered chromedriver and double click on it. A terminal window should open up. If you don't see killed in the output, you've successfully altered the driver.
Make sure that the name of the altered chromedriver binary is chromedriver, and that the original binary is either moved from its original location or renamed.
My Experience With This Method
I was previously being detected on a website while trying to log in, but after replacing cdc_ with an equal sized string, I was able to log in. Like others have said though, if you've already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, etc.

As we've already figured out in the question and the posted answers, there is an anti Web-scraping and a bot detection service called "Distil Networks" in play here. And, according to the company CEO's interview:
Even though they can create new bots, we figured out a way to identify
Selenium the a tool they’re using, so we’re blocking Selenium no
matter how many times they iterate on that bot. We’re doing that now
with Python and a lot of different technologies. Once we see a pattern
emerge from one type of bot, then we work to reverse engineer the
technology they use and identify it as malicious.
It'll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:
it's not related to the actions you take with Selenium. Once you navigate to the site, you get immediately detected and banned. I've tried to add artificial random delays between actions, take a pause after the page is loaded - nothing helped
it's not about browser fingerprint either. I tried it in multiple browsers with clean profiles and not, incognito modes, but nothing helped
since, according to the hint in the interview, this was "reverse engineering", I suspect this is done with some JavaScript code being executed in the browser revealing that this is a browser automated via Selenium WebDriver
I decided to post it as an answer, since clearly:
Can a website detect when you are using selenium with chromedriver?
Yes.
Also, I haven't experimented with older Selenium and older browser versions. In theory, there could be something implemented/added to Selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let's detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It's just a theory that needs to be tested.

A lot have been analyzed and discussed about a website being detected being driven by Selenium controlled ChromeDriver. Here are my two cents:
According to the article Browser detection using the user agent serving different webpages or services to different browsers is usually not among the best of ideas. The web is meant to be accessible to everyone, regardless of which browser or device an user is using. There are best practices outlined to develop a website to progressively enhance itself based on the feature availability rather than by targeting specific browsers.
However, browsers and standards are not perfect, and there are still some edge cases where some websites still detects the browser and if the browser is driven by Selenium controled WebDriver. Browsers can be detected through different ways and some commonly used mechanisms are as follows:
Implementing captcha / recaptcha to detect the automatic bots.
You can find a relevant detailed discussion in How does recaptcha 3 know I'm using selenium/chromedriver?
Detecting the term HeadlessChrome within headless Chrome UserAgent
You can find a relevant detailed discussion in Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python
Using Bot Management service from Distil Networks
You can find a relevant detailed discussion in Unable to use Selenium to automate Chase site login
Using Bot Manager service from Akamai
You can find a relevant detailed discussion in Dynamic dropdown doesn't populate with auto suggestions on https://www.nseindia.com/ when values are passed using Selenium and Python
Using Bot Protection service from Datadome
You can find a relevant detailed discussion in Website using DataDome gets captcha blocked while scraping using Selenium and Python
However, using the user-agent to detect the browser looks simple but doing it well is in fact a bit tougher.
Note: At this point it's worth to mention that: it's very rarely a good idea to use user agent sniffing. There are always better and more broadly compatible way to address a certain issue.
Considerations for browser detection
The idea behind detecting the browser can be either of the following:
Trying to work around a specific bug in some specific variant or specific version of a webbrowser.
Trying to check for the existence of a specific feature that some browsers don't yet support.
Trying to provide different HTML depending on which browser is being used.
Alternative of browser detection through UserAgents
Some of the alternatives of browser detection are as follows:
Implementing a test to detect how the browser implements the API of a feature and determine how to use it from that. An example was Chrome unflagged experimental lookbehind support in regular expressions.
Adapting the design technique of Progressive enhancement which would involve developing a website in layers, using a bottom-up approach, starting with a simpler layer and improving the capabilities of the site in successive layers, each using more features.
Adapting the top-down approach of Graceful degradation in which we build the best possible site using all the features we want and then tweak it to make it work on older browsers.
Solution
To prevent the Selenium driven WebDriver from getting detected, a niche approach would include either/all of the below mentioned approaches:
Rotating the UserAgent in every execution of your Test Suite using fake_useragent module as follows:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from fake_useragent import UserAgent
options = Options()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
driver.get("https://www.google.co.in")
driver.quit()
You can find a relevant detailed discussion in Way to change Google Chrome user agent in Selenium?
Rotating the UserAgent in each of your Tests using Network.setUserAgentOverride through execute_cdp_cmd() as follows:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
print(driver.execute_script("return navigator.userAgent;"))
# Setting user agent as Chrome/83.0.4103.97
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
You can find a relevant detailed discussion in How to change the User Agent using Selenium and Python
Changing the property value of navigator for webdriver to undefined as follows:
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
You can find a relevant detailed discussion in Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Changing the values of navigator.plugins, navigator.languages, WebGL, hairline feature, missing image, etc.
You can find a relevant detailed discussion in Is there a version of selenium webdriver that is not detectable?
Changing the conventional Viewport
You can find a relevant detailed discussion in How to bypass Google captcha with Selenium and python?
Dealing with reCAPTCHA
While dealing with 2captcha and recaptcha-v3 rather clicking on checkbox associated to the text I'm not a robot, it may be easier to get authenticated extracting and using the data-sitekey.
You can find a relevant detailed discussion in How to identify the 32 bit data-sitekey of ReCaptcha V2 to obtain a valid response programmatically using Selenium and Python Requests?
tl; dr
You can find a cutting edge solution to evade webdriver detection in:
selenium-stealth - a proven way to evade webdriver detection

With the availability of Selenium Stealth evading the detection of Selenium driven ChromeDriver initiated google-chrome Browsing Context have become much more easier.
selenium-stealth
selenium-stealth is a Python package to prevent detection. This programme tries to make python selenium more stealthy. However, as of now selenium-stealth only support Selenium Chrome.
Features that currently selenium-stealth can offer:
selenium-stealth with stealth passes all public bot tests.
With selenium-stealth selenium can do google account login.
selenium-stealth help with maintaining a normal reCAPTCHA v3 score
Installation
Selenium-stealth is available on PyPI so you can install with pip as follows:
pip install selenium-stealth
selenium4 compatible code
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
options = Options()
options.add_argument("start-maximized")
# Chrome is controlled by automated test software
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
# Selenium Stealth settings
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
Browser Screenshot:
tl; dr
You can find a couple of relevant detailed discussion in:
Can a website detect when you are using Selenium with chromedriver?
How to automate login to a site which is detecting my attempts to login using selenium-stealth
Undetected Chromedriver not loading correctly

Example of how it's implemented on wellsfargo.com:
try {
if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
if ("__webdriver_script_fn" in document) return !+""

Obfuscating JavaScript result
I have checked the chromedriver source code. That injects some JavaScript files into the browser.
Every JavaScript file in this link is injected to the web pages:
https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/js/
So I used reverse engineering and obfuscated the JavaScript files by hex editing. Now I was sure that no more JavaScript variables, function names and fixed strings were used to uncover selenium activity. But still some sites and reCAPTCHA detect Selenium!
Maybe they check the modifications that are caused by chromedriver JavaScript execution :)
Chrome 'navigator' parameters modification
I discovered there are some parameters in 'navigator' that briefly uncover using of chromedriver.
These are the parameters:
"navigator.webdriver" In non-automated mode it is 'undefined'. In automated mode it's 'true'.
"navigator.plugins" In headless Chrome, it has 0 length. So I added some fake elements to fool the plugin length checking process.
"navigator.languages" was set to default chrome value '["en-US", "en", "es"]'.
So what I needed was a chrome extension to run JavaScript on the web pages. I made an extension with the JavaScript code provided in the article and used another article to add the zipped extension to my project. I have successfully changed the values; but still nothing changed!
I didn't find other variables like these, but it doesn't mean that they don't exist. Still reCAPTCHA detects chromedriver, So there should be more variables to change. The next step should be reverse engineering of the detector services that I don't want to do.
Now I'm not sure if is it worth it to spend more time on this automation process or search for alternative methods!

Try to use Selenium with a specific user profile of Chrome. That way you can use it as specific user and define anything you want. When doing so, it will run as a 'real' user. Look at the Chrome process with some process explorer and you'll see the difference with the tags.
For example:
username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username +
"\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# Add any tag here you want.
options.add_experimental_option(
"excludeSwitches",
"""
ignore-certificate-errors
safebrowsing-disable-download-protection
safebrowsing-disable-auto-update
disable-client-side-phishing-detection
""".split()
)
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)
Google Chrome tag list here

partial interface Navigator { readonly attribute boolean webdriver; };
The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.
This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.
Taken directly from the 2017 W3C Editor's Draft of WebDriver. This heavily implies that at the very least, future iterations of Selenium's drivers will be identifiable to prevent misuse. Ultimately, it's hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.

All I had to do was:
my_options = webdriver.ChromeOptions()
my_options.add_argument( '--disable-blink-features=AutomationControlled' )
Some more information to this: This relates to website skyscanner.com. In the past I have been able to scrape it. Yes, it did detect the browser automation and it gave me a captcha to press and hold a button. I used to be able to complete the captcha manually, then search flights and then scrape. But this time around after completing the captcha I get the same captcha again and again, just can't seem to escape from it. I tried some of the most popular suggestions to avoid automation being detected, but they didn't work. Then I found this article which did work, and by process of elimination I found out it only took the option above to get around their browser automation detection. Now I don't even get the captcha and everything else seems to be working normally.
Versions I am running currently:
OS: Windows 7 64 bit
Python 3.8.0 (tags/v3.8.0:fa919fd, 2019-10-14) (MSC v.1916 64 bit (AMD64)) on win32
Browser: Chrome Version 100.0.4896.60 (Official
Build) (64-bit)
Selenium 4.1.3
ChromeDriver 100.0.4896.60 chromedriver_win32.zip 930ff33ae8babeaa74e0dd1ce1dae7ff

It works for some websites, remove property webdriver from navigator
from selenium import webdriver
driver = webdriver.Chrome()
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source":
"const newProto = navigator.__proto__;"
"delete newProto.webdriver;"
"navigator.__proto__ = newProto;"
})

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn't find it in the new one except for some very vague wording in the appendices.
A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says "Currently only implemented in firefox" but I wasn't able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.
I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.

Additionally to the great answer of Erti-Chris Eelmaa - there's annoying window.navigator.webdriver and it is read-only. Even if you change the value of it to false, it will still have true. That's why the browser driven by automated software can still be detected.
MDN
The variable is managed by the flag --enable-automation in chrome. The chromedriver launches Chrome with that flag and Chrome sets the window.navigator.webdriver to true. You can find it here. You need to add to "exclude switches" the flag. For instance (Go):
package main
import (
"github.com/tebeka/selenium"
"github.com/tebeka/selenium/chrome"
)
func main() {
caps := selenium.Capabilities{
"browserName": "chrome",
}
chromeCaps := chrome.Capabilities{
Path: "/path/to/chrome-binary",
ExcludeSwitches: []string{"enable-automation"},
}
caps.AddChrome(chromeCaps)
wd, err := selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", 4444))
}

One more thing I found is that some websites uses a platform that checks the User Agent. If the value contains: "HeadlessChrome" the behavior can be weird when using headless mode.
The workaround for that will be to override the user agent value, for example in Java:
chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");

The bot detection I've seen seems more sophisticated or at least different than what I've read through in the answers below.
Experiment 1
I open a browser and web page with Selenium from a Python console.
The mouse is already at a specific location where I know a link will appear once the page loads. I never move the mouse.
I press the left mouse button once (this is necessary to take focus from the console where Python is running to the browser).
I press the left mouse button again (remember, cursor is above a given link).
The link opens normally, as it should.
Experiment 2
As before, I open a browser and the web page with Selenium from a Python console.
This time around, instead of clicking with the mouse, I use Selenium (in the Python console) to click the same element with a random offset.
The link doesn't open, but I am taken to a sign up page.
Implications
opening a web browser via Selenium doesn't preclude me from appearing human
moving the mouse like a human is not necessary to be classified as human
clicking something via Selenium with an offset still raises the alarm
It seems mysterious, but I guess they can just determine whether an action originates from Selenium or not, while they don't care whether the browser itself was opened via Selenium or not. Or can they determine if the window has focus? It would be interesting to hear if anyone has any insights.

It sounds like they are behind a web application firewall. Take a look at modsecurity and OWASP to see how those work.
In reality, what you are asking is how to do bot detection evasion. That is not what Selenium WebDriver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you'd have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don't know what WAF they are using.
You did the right first step, that is, faking the user agent. If that didn't work though, then a WAF is in place and you probably need to get more tricky.
Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.

Even if you are sending all the right data (e.g. Selenium doesn't show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.
For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.
It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it'll also help you verify whether there are any specific parameters that indicate you're running in Selenium.

Answer: YES
Some sites will detect selenium by the browser's fingeprints and other data, other sites will detect selenium based on behavior, not only based on what you do, but what you don't do as well.
Usually with the data that selenium provides is enough to detect it.
you can check the browser fingerprints in sites like this ones
https://bot.sannysoft.com
https://fingerprintjs.github.io/fingerprintjs/
https://antoinevastel.com/bots/
try with your user browser, then try with selenium, you'll see the differences.
You can change some fingerprints with options(), like user agent and others, see the results by yourself.
You can try to avoid this detection by many ways, I recommend using this library:undetected_chromedriver:
https://github.com/ultrafunkamsterdam/undetected-chromedriver
import undetected_chromedriver.v2 as uc
Else you can try using an alternative to selenium. I heard of PhantomJS, but didn't tried.

Some sites are detecting this:
function d() {
try {
if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
return !0
} catch (e) {}
try {
//if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
if (window.document.documentElement.getAttribute("webdriver"))
return !0
} catch (e) {}
try {
//if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
if ("_Selenium_IDE_Recorder" in window)
return !0
} catch (e) {}
try {
//if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
if ("__webdriver_script_fn" in document)
return !0
} catch (e) {}

It seems to me the simplest way to do it with Selenium is to intercept the XHR that sends back the browser fingerprint.
But since this is a Selenium-only problem, it’s better just to use something else. Selenium is supposed to make things like this easier, not way harder.

Write an HTML page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML:
<html>
<head>
<script type="text/javascript">
<!--
function showWindow(){
javascript:(alert(document.documentElement.outerHTML));
}
//-->
</script>
</head>
<body>
<form>
<input type="button" value="Show outerHTML" onclick="showWindow()">
</form>
</body>
</html>

You can try to use the parameter "enable-automation"
var options = new ChromeOptions();
// hide selenium
options.AddExcludedArguments(new List<string>() { "enable-automation" });
var driver = new ChromeDriver(ChromeDriverService.CreateDefaultService(), options);
But, I want to warn that this ability was fixed in ChromeDriver 79.0.3945.16.
So probably you should use older versions of chrome.
Also, as another option, you can try using InternetExplorerDriver instead of Chrome. As for me, IE does not block at all without any hacks.
And for more info try to take a look here:
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Unable to hide "Chrome is being controlled by automated software" infobar within Chrome v76

I've found changing the JavaScript "key" variable like this:
//Fools the website into believing a human is navigating it
((JavascriptExecutor)driver).executeScript("window.key = \"blahblah\";");
works for some websites when using Selenium WebDriver along with Google Chrome, since many sites check for this variable in order to avoid being scraped by Selenium.

I have the same problem and solved the issue with the following configuration (in C#)
options.AddArguments("start-maximized");
options.AddArguments("--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
options.AddExcludedArgument("enable-automation"); // For hiding chrome being controlled by automation..
options.AddAdditionalCapability("useAutomationExtension", false);
// Import cookies
options.AddArguments("user-data-dir=" + userDataDir);
options.AddArguments("profile-directory=" + profileDir);

The Chromium developers recently added a 2nd headless mode in 2021, which no longer adds HeadlessChrome to the user agent string. See https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c36
Add they later renamed the option in 2023 for Chrome 109 -> https://github.com/chromium/chromium/commit/e9c516118e2e1923757ecb13e6d9fff36775d1f4
The newer --headless=new flag will now allow you to get the full functionality of Chrome in the new headless mode, and you can even run extensions in it, for Chrome 109 and above. (If using Chrome 96 through 108, use the older --headless=chrome option.)
Usage: (Chrome 109 and above):
options.add_argument("--headless=new")
Usage: (Chrome 96 through Chrome 108):
options.add_argument("--headless=chrome")
This new headless mode makes Chromium browsers work just like regular mode, which means they won't be as easily detected as Chrome in the older headless mode.
Combine that with other tools such as undetected-chromedriver for maximum evasion against Selenium-detection.

Related

Selenium WebDriver - google account login problem using python [duplicate]

I have a problem with Google login. I want to login to my account but Google says that automation drivers are not allowed to log in.
I am looking for a solution. Is it possible to get a cookie of normal Firefox/Chrome and load it into the ChromeDriver/GeckoDriver? I thought that this can be a solution. But I am not sure is it possible or not..
Looking for solutions...
Also, I want to add a quick solution. I solved this issue by
using one of my old verified account. That can be a quick solution for
you.

I had the same problem and found the solution for it. I am using
1) Windows 10 Pro
2) Chrome Version 83.0.4103.97 (Official Build) (64-bit)
3) selenium ChromeDriver 83.0.4103.39
Some simple C# code which open google pages
var options = new ChromeOptions();
options.addArguments(#"user-data-dir=c:\Users\{username}\AppData\Local\Google\Chrome\User Data\");
IWebDriver driver = new OpenQA.Selenium.Chrome.ChromeDriver();
driver = new ChromeDriver(Directory.GetCurrentDirectory(), options);
driver.Url = "https://accounts.google.com/";
Console.ReadKey();
The core problem here you cant login when you use selenium driver, but you can use the profile which already logged to the google accounts.
You have to find where your Chrome store profile is and append it with "user-data-dir" option.
PS. Replace {username} with your real account name.
On linux the user profile is in "~/.config/google-chrome".

This error message...
...implies that the WebDriver instance was unable to authenticate the Browsing Context i.e. Browser session.
This browser or app may not be secure
This error can happen due to different factors as follows:
In the article "This browser or app may not be secure" error when trying to sign in with Google on desktop apps #Raphael Schaad mentioned that, if an user can log into the same app just fine with other Google accounts, then the problem must be with the particular account. In majority of the cases the possible reason is, this particular user account is configured with Two Factor Authentification.
In the article Less secure apps & your Google Account it is mentioned that, if an app or site doesn’t meet google-chrome's security standards, Google may block anyone who’s trying to sign in to your account from it. Less secure apps can make it easier for hackers to get in to your account, so blocking sign-ins from these apps helps keep your account safe.
Solution
In these cases the respective solution would be to:
Disable Two Factor Authentification for this Google account and execute your #Test.
Allow less secure apps
You can find a detailed discussion in Unable to sign into google with selenium automation because of "This browser or app may not be secure."
Deep Dive
However, to help protect your account, Web Browsers may not let you sign in from some browsers. Google might stop sign-ins from browsers that:
Doesn't support JavaScript or have Javascript turned off.
Have AutomationExtension or unsecure or unsupported extensions added.
Use automation testing frameworks.
Are embedded in a different application.
Solution
In these cases there are diverse solutions:
Use a browser that supports JavaScript:
Chrome
Safari
Firefox
Opera
Internet Explorer
Edge
Turn on JavaScript in Web Browsers: If you’re using a supported browser and still can’t sign in, you might need to turn on JavaScript.
If you still can’t sign in, it might be because you have AutomationExtension / unsecure / unsupported extensions turned on and you may need to turn off as follows:
public class browserAppDemo
{
public static void main(String[] args) throws Exception
{
System.setProperty("webdriver.chrome.driver", "C:\\Utility\\BrowserDrivers\\chromedriver.exe");
ChromeOptions options = new ChromeOptions();
options.addArguments("start-maximized");
options.setExperimentalOption("useAutomationExtension", false);
options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation"));
WebDriver driver = new ChromeDriver(options);
driver.get("https://accounts.google.com/signin")
new WebDriverWait(driver, 10).until(ExpectedConditions.elementToBeClickable(By.xpath("//input[#id='identifierId']"))).sendKeys("gashu");
driver.findElement(By.id("identifierNext")).click();
new WebDriverWait(driver, 10).until(ExpectedConditions.elementToBeClickable(By.xpath("//input[#name='password']"))).sendKeys("gashu");
driver.findElement(By.id("passwordNext")).click();
System.out.println(driver.getTitle());
}
}
You can find a couple of relevant discussions in:
Gmail login using selenium webdriver in java
Selenium test scripts to login into google account through new ajax login form
Additional Considerations
Finally, some old browser versions might not be supported, so ensure that:
JDK is upgraded to current levels JDK 8u241.
Selenium is upgraded to current levels Version 3.141.59.
ChromeDriver is updated to current ChromeDriver v80.0 level.
Chrome is updated to current Chrome Version 80.0 level. (as per ChromeDriver v80.0 release notes)

Solution without redirecting, using firefox driver, or changing any google account settings:
If you have a specific Google account you want to access, create a chrome profile with it and then load the chrome profile when using selenium:
options = webdriver.ChromeOptions()
options.add_argument("--user-data-dir=C:/Users/{userName}/AppData/Local/Google/Chrome/User Data/Profile {#}/")
driver = webdriver.Chrome("C:/bin/chromedriver.exe", chrome_options=options)
Windows:
The profile {#} in the file path above will vary so I suggest checking inside of the User Data folder which profile you want to use. For example, if you currently only have 1 chrome account there will be no Profile directory (resorts to "Default" directory) in User Data but if you create a second chrome account there will be a "Profile 1" directory in User Data.
Note that you should create a new google chrome profile to use with selenium because attempting to use a chrome profile that is already in use (opened in another chrome window) will cause an error.
Mac:
This solution may or may not work on mac but to find the chrome account folder/filepath follow the instructions in the comment left by #bfhaha

One Solution that works for me: https://stackoverflow.com/a/60328992/12939291 or https://www.youtube.com/watch?v=HkgDRRWrZKg
Short: Stackoverflow Login with Google Account with Redirect
from selenium import webdriver
from time import sleep
class Google:
def __init__(self, username, password):
self.driver = webdriver.Chrome('./chromedriver')
self.driver.get('https://stackoverflow.com/users/signup?ssrc=head&returnurl=%2fusers%2fstory%2fcurrent%27')
sleep(3)
self.driver.find_element_by_xpath('//*[#id="openid-buttons"]/button[1]').click()
self.driver.find_element_by_xpath('//input[#type="email"]').send_keys(username)
self.driver.find_element_by_xpath('//*[#id="identifierNext"]').click()
sleep(3)
self.driver.find_element_by_xpath('//input[#type="password"]').send_keys(password)
self.driver.find_element_by_xpath('//*[#id="passwordNext"]').click()
sleep(2)
self.driver.get('https://youtube.com')
sleep(5)
username = ''
password = ''
Google(username, password)

I just tried something out that worked for me after several hours of trial and error.
Adding args: ['--disable-web-security', '--user-data-dir', '--allow-running-insecure-content' ] to my config resolved the issue.
I realized later that this was not what helped me out as I tried with a different email and it didn't work. After some observations, I figured something else out and this has been tried and tested.
Using automation:
Go to https://stackoverflow.com/users/login
Select Log in with Google Strategy
Enter Google username and password
Login to Stackoverflow
Go to https://gmail.com (or whatever Google app you want to access)
After doing this consistently for like a whole day (about 24 hours), try automating your login directly to gmail (or whatever Google app you want to access) directly... I've had at least two other people do this with success.
PS - You might want to continue with the stackoverflow login until you at least get a captcha request as we all went through that phase as well.

This might be still open / not answered
Here is an working (28.04.2021) example in this following thread:
https://stackoverflow.com/a/66308429/15784196
Use Firefox as driver. I tested his example and it did work!

#Mike-Fakesome on this https://gist.github.com/ikegami-yukino/51b247080976cb41fe93 thread suggest a solution that works
import undetected_chromedriver.v2 as uc
import random,time,os,sys
from selenium.webdriver.common.keys import Keys
GMAIL = '<GMAIL_HERE>'
PASSWORD = '<PASSWORD_HERE>'
chrome_options = uc.ChromeOptions()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--profile-directory=Default")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-plugins-discovery")
chrome_options.add_argument("--incognito")
chrome_options.add_argument("user_agent=DN")
driver = uc.Chrome(options=chrome_options)
driver.delete_all_cookies()
driver.get("https://accounts.google.com/o/oauth2/v2/auth/oauthchooseaccount?redirect_uri=https%3A%2F%2Fdevelopers.google.com%2Foauthplayground&prompt=consent&response_type=code&client_id=407408718192.apps.googleusercontent.com&scope=email&access_type=offline&flowName=GeneralOAuthFlow")
driver.find_element_by_xpath("/html/body/div[1]/div[1]/div[2]/div/div[2]/div/div/div[2]/div/div[1]/div/form/span/section/div/div/div[1]/div/div[1]/div/div[1]/input").send_keys(GMAIL)
driver.find_element_by_xpath("/html/body/div[1]/div[1]/div[2]/div/div[2]/div/div/div[2]/div/div[1]/div/form/span/section/div/div/div[1]/div/div[1]/div/div[1]/input").send_keys(Keys.RETURN)
time.sleep(10)
also you can use import undetected_chromedriver as uc instead of import undetected_chromedriver.v2 as uc now as well

I solved this issue last week using following steps:
First two steps are out of your project code.
Create a new user directory for Chrome browser.
You can name this folder whatever you like and place it anywhere.
Run Chrome browser in debugger mode using just created directory
cd C:\Program Files\Google\Chrome\Application
chrome.exe --remote-debuggin-port=9222 --user-data-dir="C:\localhost"
You can use any free port but I followed this article:
https://chromedevtools.github.io/devtools-protocol/
Browser window opens.
Login manually to Google / Facebook / etc using opened window.
Close the browser.
In your project:
Copy chrome-user-directory you just created into 'resources' package.
Set debugging option for Chrome driver.
/**
* This method is added due to Google security policies changed.
* Now it's impossible to login in Google account via Selenium at first time.
* We use a user data directory for Chrome where we previously logged in.
*/
private WebDriver setWebDriver() {
ChromeOptions options = new ChromeOptions();
options.addArguments("--user-data-dir=" + System.getProperty("user.dir") + "/src/main/resources/localhost");
options.addArguments("--remote-debugging-port=9222");
return new ChromeDriver(options);
}
Enjoy.
PS: If you have another solution without copying chrome user-directory into the project, please share it)

I found a solution ,#theycallmepix and #Yinka Albi are correct but because(i think) google blacklisted accounts that did just programatically login the first time and so later they coudn't login normally. So Basically just use different a account and go to to Stackauth or StackoverFlow. Then manually login with Google(first link your account) And then manually login in google.com and then it should work prgramaticaly
P.S. pls comment if this doesn't work

Use the below given snippet method to Login to your Google Account.
Language: Python3
Redirect via: StackAuth (Reason explained at the end)
[Edit: You need to import the required packages. Make sure that the Automation that you do is running in Foreground, I mean, it's not minimised until you login completely. Once if the login is successful, then you can re-direct to the required website that you want.]
def login(username, password): # Logs in the user
driver.get('https://accounts.google.com/o/oauth2/auth/identifier?client_id=717762328687-iludtf96g1hinl76e4lc1b9a82g457nn.apps.googleusercontent'
'.com&scope=profile%20email&redirect_uri=https%3A%2F%2Fstackauth.com%2Fauth%2Foauth2%2Fgoogle&state=%7B%22sid%22%3A1%2C%22st%22%3A%2'
'259%3A3%3Abbc%2C16%3A561fd7d2e94237c0%2C10%3A1599663155%2C16%3Af18105f2b08c3ae6%2C2f06af367387a967072e3124597eeb4e36c2eff92d3eef697'
'1d95ddb5dea5225%22%2C%22cdl%22%3Anull%2C%22cid%22%3A%22717762328687-iludtf96g1hinl76e4lc1b9a82g457nn.apps.googleusercontent.com%22%'
'2C%22k%22%3A%22Google%22%2C%22ses%22%3A%2226bafb488fcc494f92c896ee923849b6%22%7D&response_type=code&flowName=GeneralOAuthFlow')
driver.find_element_by_name("identifier").send_keys(username)
WebDriverWait(driver, 10).until(expected_conditions.element_to_be_clickable((By.XPATH, "//*[#id='identifierNext']/div/button/div[2]"))).click()
driver.implicitly_wait(4)
try:
driver.find_element_by_name("password").send_keys(password)
WebDriverWait(driver, 2).until(expected_conditions.element_to_be_clickable((By.XPATH, "//*[#id='passwordNext']/div/button/div[2]"))).click()
except TimeoutException:
print('\nUsername/Password seems to be incorrect, please re-check\nand Re-Run the program.')
del username, password
exit()
except NoSuchElementException:
print('\nUsername/Password seems to be incorrect, please re-check\nand Re-Run the program.')
del username, password
exit()
try:
WebDriverWait(driver, 5).until(lambda webpage: "https://stackoverflow.com/" in webpage.current_url)
print('\nLogin Successful!\n')
except TimeoutException:
print('\nUsername/Password seems to be incorrect, please re-check\nand Re-Run the program.')
exit()
The above code, takes 2 parameters - gmailID and password. If the password or username is wrong, then you'll notified.
Why stackauth?
-> Stackauth uses OAuth 2.0 authorisation to access Google APIs(here, Google account login needs Google API to work) to securely login a user into his/her Google Account.
Click here to read more about OAuth.
Edit:
I just answered to my own question which I'd posted yesterday thinking that it might help you.
As of now, 2021, it can successfully bypass all the google restrictions that used to occur when logging in.
Feel free to revert back if it doesn't work.
Link to my answer is here

If your Chrome browser was spun up using Chromedriver, then there is detectable evidence that websites can use to determine if you're using Selenium, and then they can block you. However, if the Chrome browser is spun up before Chromedriver connects to it, then you have a browser that no longer looks like an automation-controlled one. Modern web automation libraries such as undetectable-chromedriver are aware of this, and so they make sure Chrome is spun up before connecting chromedriver to it.
The modern framework that I use for these situations is SeleniumBase in undetected chromedriver mode. Here's a script that you can use to get past automation detection on Google: (Run with python after installing seleniumbase with pip install -U seleniumbase)
from seleniumbase import SB
with SB(uc=True) as sb:
sb.open("https://www.google.com/gmail/about/")
sb.click('a[data-action="sign in"]')
sb.type('input[type="email"]', "NAME#gmail.com")
sb.click('button:contains("Next")')
sb.type('input[type="password"]', PASSWORD)
sb.click('button:contains("Next")')
sb.sleep(5)

A slow yet good solution would be delaying every key press. Why? because google uses a kind of captcha where it analyzes your typing speed and more things. So if you wanna type a mail or password like example#example.com, you'd have to do this:
for i in "example#example.com\n": #\n because the submit doesn't work in input fields in google sign in, so \n is equivalent of pressing enter
element.send_keys(i)
time.sleep(0.4) #don't forget to import time or something else with which you could delay your code!
time.sleep(1) #also this sleep because when the button will redirect url, it'd not reload the site and selenium will not wait so one more sleep
PS: if not working, try changing the values of sleep or any other delaying function

Google returning different layouts for pagination

I am using selenium and chrome to search on google. But it is returning different layouts for pagination. I am using different proxies and different user agents using the fake_useragent library.
I only want the second image layout. Does anybody know how can I get it every time?
First Image
Second Image

The issue was fake_useragent library was returning old user-agents sometimes even if I update the database. I tried this library(https://pypi.org/project/latest-user-agents/) and it returns newer user-agents.
Here is the working code.
from latest_user_agents import get_latest_user_agents
import random
from selenium import webdriver
PATH = 'C:\Program Files (x86)\chromedriver.exe'
proxy = ''
url = ''
user_agent = random.choice(get_latest_user_agents())
options = webdriver.ChromeOptions()
options.add_argument(f'--proxy-server={proxy}')
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(PATH, options=options)
driver.get(url)

The difference between the two layouts is when you disable javascript, Google will show the pagination as the first image layout.
To ensure that you get the second layout every time, you would need to make sure javascript is enabled.
If you have a chrome driver from selenium like: options = webdriver.ChromeOptions(), the following would make sure javascript is always enabled:
options.add_argument("--enable-javascript")
Edit based on OP's comment
I got it working by using the latest_user_agents library. The fake_useragent library was returning old user-agents sometimes. That's why it was showing the old layout.
Installing the latest_user_agents library: https://pypi.org/project/latest-user-agents/

Hey Dont try to automate google and google products by automation tools because every day google are changing webelements and view of thier pages.
For multiple reasons, logging into sites like Gmail and Facebook using WebDriver is not recommended. Aside from being against the usage terms for these sites (where you risk having the account shut down), it is slow and unreliable.
The ideal practice is to use the APIs that email providers offer, or in the case of Facebook the developer tools service which exposes an API for creating test accounts, friends, and so forth. Although using an API might seem like a bit of extra hard work, you will be paid back in speed, reliability, and stability. The API is also unlikely to change, whereas webpages and HTML locators change often and require you to update your test framework.
Logging in to third-party sites using WebDriver at any point of your test increases the risk of your test failing because it makes your test longer. A general rule of thumb is that longer tests are more fragile and unreliable.
WebDriver implementations that are W3C conformant also annotate the navigator object with a WebDriver property so that Denial of Service attacks can be mitigated.

Open Application (such as zoom.us) with Selenium Webdriver

I want to be able to use pure selenium webdriver to open a zoom link in Chrome and then redirect me to the zoom.us application.
When I execute this:
from selenium import webdriver
def main():
driver = webdriver.Chrome()
driver.get("https://zoom.us/j/000-000-000")
main()
I receive a pop-up saying
https://zoom.us wants to open this application.
and I must press a button titled open zoom.us to open the app.
Is there a way to press this pop-up button through selenium. Or, is there some other way to open zoom from chromedriver?
NOTE: I only want to use selenium. I have been able to implement pyautogui to click on the button but that is not what I am looking for.

Solution for Java:
driver.switchTo().alert().accept();
Solution for Python:
driver.switch_to.alert.accept()

There are a lot of duplicated questions regarding this issue. Here is one of them, and it is quite sure that selenium is not capable of achieving such job since it only interacts with the chrome page. I previously encountered this issue as well and here is my solution to it. It might look really unprofessional, but fortunately it works.
The logic of my solution is to change the setting of chrome in order to skip the popup and directly open the application you want. However, the Chrome team has removed this feature in the latter version for some reasons, and we need to get it back manually. Then, we know that everytime when selenium starts to do the thing it opens a new Chrome page with NO customized settings just like the incognito page. Therefore we need to do something to let selenium opened a Chrome page with your customized setting, so that we can make sure that the popup, which we changed manually to skip, can be skipped successfully.
Type the following code in your terminal.
defaults write com.google.Chrome ExternalProtocolDialogShowAlwaysOpenCheckbox -bool true
This enables you to change the setting of skipping popups, which is the feature Chrome team removed.
Restart Chrome,and open the zoom (or whatever application) page to let the popup display. If you do the 1st step correctly you will be able to see there is a checkbox shown next to the "Open Zoom.us" saying if you check it chrome will open this application without asking, that is, to skip the popup for this application.
Now we need to let selenium open the Chrome with our customized setting. To do this, type "chrome://version" in the search tab of your ordinary Chrome (Not automated page opened by selenium). Go to "Profile Path", and copy this path without the last word "default". For example:
/Users/MYNAME/Library/Application Support/Google/Chrome/Default
This is my profile path, but I only copy everything except the last word Default, so this is what I need to copy.
/Users/MYNAME/Library/Application Support/Google/Chrome/
This is for Mac users, but for Windows only the path is different(starts with C:// or something), steps are same.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
option = Options()
option.add_argument('THE PATH YOU JUST COPIED')
driver = webdriver.Chrome(executable_path='YOUR PATH TO CHROMEDRIVER', options=option)
driver.get("google.com") #Or anything else
We use "options" to let selenium open a page with our customized profile. Now you will see selenium opens a Chrome page with all your account profile, settings, and it just appears like your ordinary chrome page.
Run your code. But before that, remember to quit ALL CHROME sessions manually. For Mac, make sure that there is no dot under Chrome icon indicating that Chrome is not running for any circumstances. THIS STEP IS CRITICAL otherwise selenium will open a chrome page and it just stops there.
Here are all the steps. Again, this solution is vert informal and I personally don't think it is a "solution" to this problem. I will try to figure out a better way of achieving this in the future. But I still posted this as an alternative simply because I guess it might be helpful to some extent for somebody just like me. Hope it works for you, and good luck.

Python Selenium PhantomJS login dialog

Please note, this question is Python 3.5.2, only Python answers will be accepted. Unless this can definitely be handled in Java? Automating a process as part of an internal project. Everything works just fine using the IE webdriver, but not phantomJS web driver (which is expected due to limited functionality). However, a work-around / solution is required.
When opening the internal site, a Windows Security login dialog box comes up prompting for a username, password and press 'Ok'. With the IE web driver, it is handled just fine with:
loginAlert = driver.switch_to_alert()
loginAlert.authenticate(username, password)
The javascript:
driver.execute_script("window.confirm = function(){return true;}")
Being run before loading the page that gives the prompt, doesn't seem to confirm the login alert, for either phantom or IE. Even if it did, this doesn't type in the login details. As mentioned, it's a Windows Security prompt from the browser, not an element.
Once logged in, the page is reloaded with an ASP.NET_SessionId Cookie which expires once the session is ended. I've tried logging in through IE, then adding the cookie into Phantom, but it doesn't seem to match up the domains.
I've tried using:
driver.save_screenshot(filename) to see what's happening in phantom
Which works with IE driver, but with PhantomJS, only a transparent image is saved. The whole http://username:pass#site.com thing doesn't work for either IE or phantom driver. It can't load / use the URL when this is done.
How can the Windows Security login dialog be handled, or worked around? I tried looking into alternatives, such as pyvirtualdisplay, but found no information on how to get this working with Python 3 on windows.
I have also tried setting phantomjs desired capabilities custom header authentication, but that doesn't seem to do anything for this either.
I have also tried using ActionChains, however they don't work when the Alert is there (in either IE or phantom driver). An UnexpectedAlertPresentException is thrown, even if this is caught and you try to perform the actions, once caught, the alert seems to close.

My bad!
Whilst the username:pass#domain.com didn't work in the IE webdriver - it did work in the PhantomJS web driver.
However, the website has limited browser compatibility - it doesn't load properly in either Chrome or Firefox, it is IE particular.
PhantomJS seems to handle the site the same way as Chrome / Firefox based on page source comparisons.
As such, I am trying to find a way to make the current IE driver invisible / hidden.
I have found:
headless-selenium-for-win using Python
However, despite the user here saying they got it to work, when I try to initialize the driver, it just hangs, the code doesn't proceed and no error messages are provided.
Asking another question regarding this.

Selenium with Python, how do I get the page output after running a script?

I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the browser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks

using Selenium Webdriver and Python, you would simply access the .page_source property to get the source of the current page.
for example, using Firefox() driver:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.example.com/')
print(driver.page_source)
driver.quit()

There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it

Ok, so here is how I ended up doing this, for anyone who needs this in the future..
You have to use firefox for this to work.
1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this
2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.
3) then just start the selenium server to use the profile you created (below is an example for linux)
cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/
Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.