How do you get the geckodriver version from python selenium firefox webdriver? - python

I'm trying to write in some defensive code to prevent someone from executing a script should they have an older version of geckodriver installed. I cannot for the life of me seem to get the geckodriver version from the webdriver object.
The closest I found is driver.capabilities which contains the firefox browser version, but not the geckodriver version.
from selenium import webdriver
driver = webdriver.Firefox()
pprint(driver.capabilities)
output:
{'acceptInsecureCerts': True,
'browserName': 'firefox',
'browserVersion': '60.0',
'moz:accessibilityChecks': False,
'moz:headless': False,
'moz:processID': 18584,
'moz:profile': '/var/folders/qz/0dsxssjd1133p_y44qbdszn00000gp/T/rust_mozprofile.GsKFWZ9kFgMT',
'moz:useNonSpecCompliantPointerOrigin': False,
'moz:webdriverClick': True,
'pageLoadStrategy': 'normal',
'platformName': 'darwin',
'platformVersion': '17.5.0',
'rotatable': False,
'timeouts': {'implicit': 0, 'pageLoad': 300000, 'script': 30000}}
Is it possible the browser version and geckodriver versions are linked directly? if not, how can I check the geckodriver version from within python?

There is no method in the python bindings to get the geckodriver version, you will have to implement it yourself, my first option would be subprocess
# Mind the encoding, it must match your system's
output = subprocess.run(['geckodriver', '-V'], stdout=subprocess.PIPE, encoding='utf-8')
version = output.stdout.splitlines()[0].split()[-1]

It appears that moz:geckodriverVersion has been added to the capabilities sometime late 2018.
driverversion = driver.capabilities['moz:geckodriverVersion']
browserversion = driver.capabilities['browserVersion']

Related

How to login to website which is detecting bot usage using Selenium [duplicate]

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?
I have been trying for so long I lost track of the many things I have done including:
Trying different versions of Chrome
Adding several flags and removing some words from the Chrome driver file.
Running it behind a proxy (residential ones also) using incognito mode.
Loading profiles.
Random mouse movements.
Randomising everything.
I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.
This is part of the starting of the browser:
sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)
display = Display(visible=0, size=(sx,sn))
display.start()
randagent = random.randint(0,len(useragents_desktop)-1)
uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False
chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")
wsize = "--window-size=" + str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)
The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.
However some generic approaches to avoid getting detected while web-scraping are as follows:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
#Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.
User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
A check for the presence of Chrome headless can be done through:
if (/HeadlessChrome/.test(window.navigator.userAgent)) {
console.log("Chrome headless detected");
}
Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.
A check for the presence of Plugins can be done through:
if(navigator.plugins.length == 0) {
console.log("It may be Chrome headless");
}
Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.
A check for the presence of Languages can be done through:
if(navigator.languages == "") {
console.log("Chrome headless detected");
}
WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.
A check for the presence of WebGL can be done through:
var canvas = document.createElement('canvas');
var gl = canvas.getContext('webgl');
var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
console.log("Chrome headless detected");
}
Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.
Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.
A check for the presence of hairline feature can be done through:
if(!Modernizr["hairline"]) {
console.log("It may be Chrome headless");
}
Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.
A check for the presence of Missing image can be done through:
var body = document.getElementsByTagName("body")[0];
var image = document.createElement("img");
image.src = "http://iloveponeydotcom32188.jg";
image.setAttribute("id", "fakeimage");
body.appendChild(image);
image.onerror = function(){
if(image.width == 0 && image.height == 0) {
console.log("Chrome headless detected");
}
}
References
You can find a couple of similar discussions in:
How to bypass Google captcha with Selenium and python?
How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
tl; dr
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha
why not try undetected-chromedriver?
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
Tested until current chrome beta versions
Works also on Brave Browser and many other Chromium based browsers
Python 3.6++
you can install it with: pip install undetected-chromedriver
There are important things you should be ware of:
Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is data:, including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
For Python with Chrome or Chromium-based browsers, there's Selenium-Profiles
It currently supports:
Overwrite device metrics with fake-profiles
Mobile and Desktop emulation
Undetected by Google, Cloudflare, ..
Modifying headers supported using Selenium-Interceptor
Touch Actions
proxies with authentication
making single POST, GET or other requests using driver.requests.fetch(url, options) (syntax)
Installation
pip install selenium-profiles
Example script
from selenium_profiles.driver import driver as mydriver
from selenium_profiles.profiles import profiles
mydriver = mydriver()
driver = mydriver.start(profiles.Windows()) # or .Android
# get url
driver.get('https://nowsecure.nl/#relax') # test undetectability
input("Press ENTER to exit: ")
driver.quit() # Execute on the End!
Notes:
The package is licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , means, in case you want to use it for something commercial, you need to ask the author first.
headless support currently isn't guaranteed, but you can use pyvirtualdisplay
What about:
import random
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\DusEck\\Desktop\\chromedriver.exe")
username = "username" # data_user
password = "password" # data_pass
driver.get("https://www.depop.com/login/") # get URL
driver.find_element_by_xpath('/html/body/div[1]/div/div[3]/div[2]/button[2]').click() # Accept cookies
split_char_pw = [] # Empty lists
split_char = []
n = 1 # Splitter
for index in range(0, len(username), n):
split_char.append(username[index: index + n])
for user_letter in split_char:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("username").send_keys(user_letter)
for index in range(0, len(password), n):
split_char.append(password[index: index + n])
for pw_letter in split_char_pw:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("password").send_keys(pw_letter)

I cannot get Chrome to default to saving as a PDF when using Selenium

I'm trying to save some web pages to PDF using Python, Selenium, and Chrome, and I can't get the printer to default to Chrome's built-in "save as PDF" option.
I have found examples of how to do this in various places online, including in questions people have asked on Stack Overflow, but they way they're all implementing it doesn't work and I'm not sure if something has changed in more recent versions of Chrome, or if I'm somehow doing something wrong (for example, here is a page that has these settings: Missing elements when using selenium chrome driver to automatically 'Save as PDF').
I only included the default download location change in this code to verify it's accepting any changes at all - if you download one of the Python installs from that page, it will download to the new location and not to the standard download folder, so Chrome seems to be accepting these changes.
The problem appears to be the option "selectedDestinationID", which doesn't seem to do anything.
from selenium import webdriver
import time
import json
chrome_options = webdriver.ChromeOptions()
app_state = {
'recentDestinations': [{
'id': 'Save as PDF',
'origin': 'local'
}],
'selectedDestinationId': 'Save as PDF',
'version': 2
}
prefs = {
'printing.print_preview_sticky_settings.appState': json.dumps(app_state),
'download.default_directory': 'c:\\temp\\seleniumtesting\\'
}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path='C:\\temp\\seleniumtesting\\chromedriver.exe', options=chrome_options)
driver.get('https://www.python.org/downloads/release/python-373/')
time.sleep(25)
driver.close()
After the page launches, hitting ctrl+p brings up the printing page, but it defaults to the default printer. If I bring up the same page in my standard Chrome installation, it defaults to printing to PDF. I want to get to the point where I can add kiosk printing and then call window.print(), but as of now all that does is send it to the actual paper printer.
Thanks for any help anyone can offer. I'm stumped, and at this point it probably would have been faster to just save all of these manually.
It seems that if you have network printers configured they load up after opening the dialog and override your selectedDestination.
There is a preference "printing.default_destination_selection_rules" which seems to resolve.
prefs = {
"printing.print_preview_sticky_settings.appState": json.dumps(app_state),
"download.default_directory": "c:\\temp\\seleniumtesting\\".startswith(),
"printing.default_destination_selection_rules": {
"kind": "local",
"namePattern": "Save as PDF",
},
}
https://chromium.googlesource.com/chromium/src/+/master/chrome/common/pref_names.cc#1318
https://www.chromium.org/administrators/policy-list-3#DefaultPrinterSelection

How to download file using remote Firefox webdriver?

I've tried to adapt several existing solutions (1, 2) to the remote Firefox webdriver running in a selenium/standalone-firefox Docker container:
options = Options()
options.set_preference('browser.download.dir', '/src/app/output')
options.set_preference('browser.download.folderList', 2)
options.set_preference('browser.download.manager.showWhenStarting', False)
options.set_preference('browser.helperApps.alwaysAsk.force', False)
options.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf')
options.set_preference('pdfjs.disabled', True)
options.set_preference('pdfjs.enabledCache.state', False)
options.set_preference('plugin.disable_full_page_plugin_for_types', False)
cls.driver = webdriver.Remote(
command_executor='http://selenium:4444/wd/hub',
desired_capabilities={'browserName': 'firefox', 'acceptInsecureCerts': True},
options=options
)
Navigating and clicking the relevant download button works fine, but the file never appears in the download directory. I've verified everything I can think of:
The user in the Selenium container can create files in /src/app/output and those files are visible in the host OS.
I can download the file successfully using my desktop browser.
The response content type is application/pdf.
What am I missing?
It turned out other changes done while researching this were resulting in the server returning a text/plain document rather than a PDF file. For reference, this is the simplest set of options I could get to work:
options.set_preference('browser.download.dir', DOWNLOAD_DIRECTORY)
options.set_preference('browser.download.folderList', 2)
options.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf')
options.set_preference('pdfjs.disabled', True)

How to authenticate and browse to another URL with python Selenium Webdriver

I am using Python Selenium Chrome WebDriver
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
and
self.driver = webdriver.Chrome(chrome_options=options, desired_capabilities=capabilities)
print self.driver.get('https://192.168.178.20:1337/login?email=me#domain.com&password=mypassword')
print self.driver.get('https://192.168.178.20:1337/this/that?name=john')
Before I didn't need to authenticate and my GET went through, but now I do using PUT request with email and password params. I have tested the PUT in POSTMAN and it worked fine.
Once authenticated, I want to browse to another URL using GET, but I am getting a 500 most likely because it didn't retain that I have authenticated.
How can I check that my login worked? How do I retrieve the response?
Do I need to retrieve & save some kind of token or cookie for the 2nd request go through?
Console log
headless_chrome > Auth
None
headless_chrome > GET
None
headless_chrome > CONSOLE
headless_chrome console > {u'source': u'network', u'message': u'https://192.168.178.20:1337/this/that?name=john 0:0 Failed to load resource: the server responded with a status of 500 (Internal Server Error)', u'timestamp': 1479212713208, u'level': u'SEVERE'}
headless_chrome > title:
{'_file_detector': <selenium.webdriver.remote.file_detector.LocalFileDetector object at 0x7f8507a25f50>,
'_is_remote': False,
'_mobile': <selenium.webdriver.remote.mobile.Mobile object at 0x7f8507a25d90>,
'_switch_to': <selenium.webdriver.remote.switch_to.SwitchTo instance at 0x7f8507a336c8>,
'capabilities': {u'acceptSslCerts': True,
u'applicationCacheEnabled': False,
u'browserConnectionEnabled': False,
u'browserName': u'chrome',
u'chrome': {u'chromedriverVersion': u'2.21.371461 (633e689b520b25f3e264a2ede6b74ccc23cb636a)',
u'userDataDir': u'/tmp/.com.google.Chrome.ybR9Fm'},
u'cssSelectorsEnabled': True,
u'databaseEnabled': False,
u'handlesAlerts': True,
u'hasTouchScreen': False,
u'javascriptEnabled': True,
u'locationContextEnabled': True,
u'mobileEmulationEnabled': False,
u'nativeEvents': True,
u'platform': u'Linux',
u'rotatable': False,
u'takesHeapSnapshot': True,
u'takesScreenshot': True,
u'version': u'50.0.2661.102',
u'webStorageEnabled': True},
'command_executor': <selenium.webdriver.chrome.remote_connection.ChromeRemoteConnection object at 0x7f8507a25cd0>,
'error_handler': <selenium.webdriver.remote.errorhandler.ErrorHandler object at 0x7f8507a25d50>,
'service': <selenium.webdriver.chrome.service.Service object at 0x7f85083bc3d0>,
'session_id': u'72a5ce48d950be26b3f33de4adb34428',
'w3c': False}
headless_chrome > except else
clean up Selenium browser
Selenium webdriver has no .put method. You should use .get for both authentication and then navigating to another url. Ideally this should work.
If that works manually then it should work through webdriver also.

Is there a way to use PhantomJS in Python?

I want to use PhantomJS in Python. I googled this problem but couldn't find proper solutions.
I find os.popen() may be a good choice. But I couldn't pass some arguments to it.
Using subprocess.Popen() may be a proper solution for now. I want to know whether there's a better solution or not.
Is there a way to use PhantomJS in Python?
The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is
Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs-prebuilt
install selenium (in your virtualenv, if you are using that)
After installation, you may use phantom as simple as:
from selenium import webdriver
driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()
If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS(). Replace this:
driver = webdriver.PhantomJS() # or add to your PATH
... with the following:
driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')
References:
http://selenium-python.readthedocs.io/
How do I set a proxy for phantomjs/ghostdriver in python webdriver?
https://dzone.com/articles/python-testing-phantomjs
PhantomJS recently dropped Python support altogether. However, PhantomJS now embeds Ghost Driver.
A new project has since stepped up to fill the void: ghost.py. You probably want to use that instead:
from ghost import Ghost
ghost = Ghost()
with ghost.start() as session:
page, extra_resources = ghost.open("http://jeanphi.me")
assert page.http_status==200 and 'jeanphix' in ghost.content
Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.
I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.
Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://stackoverflow.com/a/14267295/382630)
Install Selenium using pip.
Now you can use like this
import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing
driver.quit()
Here's how I test javascript using PhantomJS and Django:
mobile/test_no_js_errors.js:
var page = require('webpage').create(),
system = require('system'),
url = system.args[1],
status_code;
page.onError = function (msg, trace) {
console.log(msg);
trace.forEach(function(item) {
console.log(' ', item.file, ':', item.line);
});
};
page.onResourceReceived = function(resource) {
if (resource.url == url) {
status_code = resource.status;
}
};
page.open(url, function (status) {
if (status == "fail" || status_code != 200) {
console.log("Error: " + status_code + " for url: " + url);
phantom.exit(1);
}
phantom.exit(0);
});
mobile/tests.py:
import subprocess
from django.test import LiveServerTestCase
class MobileTest(LiveServerTestCase):
def test_mobile_js(self):
args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
result = subprocess.check_output(args)
self.assertEqual(result, "") # No result means no error
Run tests:
manage.py test mobile
The answer by #Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:
Install PhantomJS
As #Vivin-Paliath points out, it's a standalone project, not part of Node.
Mac:
brew install phantomjs
Ubuntu:
sudo apt-get install phantomjs
etc
Set up a virtualenv (if you haven't already):
virtualenv mypy # doesn't have to be "mypy". Can be anything.
. mypy/bin/activate
If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy or similar.
Install selenium:
pip install selenium
Try a simple test, like this borrowed from the docs:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.PhantomJS()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.
command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
output, errors = process.communicate(timeout=30)
except Exception as e:
print("\t\tException: %s" % e)
process.kill()
# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
phantom_output += out_line.decode('utf-8')
If using Anaconda, install with:
conda install PhantomJS
in your script:
from selenium import webdriver
driver=webdriver.PhantomJS()
works perfectly.
In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.
[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs
That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)
driver = webdriver.PhantomJS('bin/phantomjs')

Categories