Cannot use chrome cast button with selenium - python

I am trying to use Selenium to cast youtube videos to my chromecast. When I open youtube in chrome normally I see the cast button and it works fine. When I open it with Selenium the cast button is missing, and when I select Cast from menu it gives me the error "No Cast destinations found. Need help?"
I am using python, and have tried lots of combinations of flags with webdriver. Here is what I have
options = webdriver.ChromeOptions()
options.add_argument('--user-data-dir=./ChromeProfile')
options.add_argument('--disable-session-crashed-bubble')
options.add_argument('--disable-save-password-bubble')
options.add_argument('--disable-permissions-bubbles')
options.add_argument('--bwsi')
options.add_argument('--load-media-router-component-extension')
options.add_argument('--enable-video-player-chromecast-support');
excludeList = ['disable-component-update',
'ignore-certificate-errors',
]
options.add_experimental_option('excludeSwitches', excludeList)
chromedriverPath = '/my/path/to/chromedriver'
driver = webdriver.Chrome(chromedriverPath, chrome_options=options)
path = 'https://www.youtube.com/watch?v=Bz9Lza059NU'
driver.get(path);
time.sleep(60) # Let the user actually see something!
driver.quit()

I figured out how to get it working. It seemed to require two steps. Copying my default profile over to somewhere selenium could use it, and figuring out the correct flags to use when opening chrome. The key being selenium automatically added a bunch of flags that I didn't want, so I had to exclude one.
First to find out where my profile is stored, I opened up chrome to this url chrome://version/.
This gave me lots of information, but the important ones were
Command Line: /usr/lib/chromium-browser/chromium-browser --enable-pinch --flag-switches-begin --flag-switches-end
Profile Path: /home/mdorrell/.config/chromium/Default
First I copied my profile to some directory that Selenium could use
cp -R /home/mdorrell/.config/chromium/Default/* /home/mdorrell/ChromeProfile
Then I opened this same page in the browser opened by selenium and got the list of flags that selenium added from the Command Line row. The one that ended up giving me the problems was --disable-default-apps
In the end the code that I needed to add ended up looking like this
options = webdriver.ChromeOptions()
# Set the user data directory
options.add_argument('--user-data-dir=/home/mdorrell/ChromeProfile')
# get list of flags selenium adds that we want to exclude
excludeList = [
'disable-default-apps',
]
options.add_experimental_option('excludeSwitches', excludeList)
chromedriverPath = '/my/path/to/chromedriver'
driver = webdriver.Chrome(chromedriverPath, chrome_options=options)
path = 'https://www.youtube.com/watch?v=Bz9Lza059NU'
driver.get(path);
time.sleep(60) # Let the user actually see something!
driver.quit()

Thanks #MikeD for sharing your answer.
I was having the same issue when I wanted to chrome cast a R Shiny Dashboard via a selenium browser (with RSelenium). If I'd click on Cast it would show me "No Cast destinations found. Need help?", whereas from a normal browser it works fine.
In my case, it worked after excluding two switches (including the ChromeProfile was not necessary), which in R can be done with:
library(RSelenium)
options <- list()
options$chromeOptions$excludeSwitches <- list('disable-background-networking',
'disable-default-apps')
rD <- rsDriver(verbose = FALSE, port = 4570L, extraCapabilities = options)

Related

How to login to website which is detecting bot usage using Selenium [duplicate]

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?
I have been trying for so long I lost track of the many things I have done including:
Trying different versions of Chrome
Adding several flags and removing some words from the Chrome driver file.
Running it behind a proxy (residential ones also) using incognito mode.
Loading profiles.
Random mouse movements.
Randomising everything.
I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.
This is part of the starting of the browser:
sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)
display = Display(visible=0, size=(sx,sn))
display.start()
randagent = random.randint(0,len(useragents_desktop)-1)
uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False
chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")
wsize = "--window-size=" + str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)
The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.
However some generic approaches to avoid getting detected while web-scraping are as follows:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
#Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.
User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
A check for the presence of Chrome headless can be done through:
if (/HeadlessChrome/.test(window.navigator.userAgent)) {
console.log("Chrome headless detected");
}
Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.
A check for the presence of Plugins can be done through:
if(navigator.plugins.length == 0) {
console.log("It may be Chrome headless");
}
Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.
A check for the presence of Languages can be done through:
if(navigator.languages == "") {
console.log("Chrome headless detected");
}
WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.
A check for the presence of WebGL can be done through:
var canvas = document.createElement('canvas');
var gl = canvas.getContext('webgl');
var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
console.log("Chrome headless detected");
}
Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.
Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.
A check for the presence of hairline feature can be done through:
if(!Modernizr["hairline"]) {
console.log("It may be Chrome headless");
}
Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.
A check for the presence of Missing image can be done through:
var body = document.getElementsByTagName("body")[0];
var image = document.createElement("img");
image.src = "http://iloveponeydotcom32188.jg";
image.setAttribute("id", "fakeimage");
body.appendChild(image);
image.onerror = function(){
if(image.width == 0 && image.height == 0) {
console.log("Chrome headless detected");
}
}
References
You can find a couple of similar discussions in:
How to bypass Google captcha with Selenium and python?
How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
tl; dr
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha
why not try undetected-chromedriver?
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
Tested until current chrome beta versions
Works also on Brave Browser and many other Chromium based browsers
Python 3.6++
you can install it with: pip install undetected-chromedriver
There are important things you should be ware of:
Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is data:, including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
For Python with Chrome or Chromium-based browsers, there's Selenium-Profiles
It currently supports:
Overwrite device metrics with fake-profiles
Mobile and Desktop emulation
Undetected by Google, Cloudflare, ..
Modifying headers supported using Selenium-Interceptor
Touch Actions
proxies with authentication
making single POST, GET or other requests using driver.requests.fetch(url, options) (syntax)
Installation
pip install selenium-profiles
Example script
from selenium_profiles.driver import driver as mydriver
from selenium_profiles.profiles import profiles
mydriver = mydriver()
driver = mydriver.start(profiles.Windows()) # or .Android
# get url
driver.get('https://nowsecure.nl/#relax') # test undetectability
input("Press ENTER to exit: ")
driver.quit() # Execute on the End!
Notes:
The package is licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , means, in case you want to use it for something commercial, you need to ask the author first.
headless support currently isn't guaranteed, but you can use pyvirtualdisplay
What about:
import random
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\DusEck\\Desktop\\chromedriver.exe")
username = "username" # data_user
password = "password" # data_pass
driver.get("https://www.depop.com/login/") # get URL
driver.find_element_by_xpath('/html/body/div[1]/div/div[3]/div[2]/button[2]').click() # Accept cookies
split_char_pw = [] # Empty lists
split_char = []
n = 1 # Splitter
for index in range(0, len(username), n):
split_char.append(username[index: index + n])
for user_letter in split_char:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("username").send_keys(user_letter)
for index in range(0, len(password), n):
split_char.append(password[index: index + n])
for pw_letter in split_char_pw:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("password").send_keys(pw_letter)

Python Selenium Converted to Powershell

Looking to write a script for work to go to one of our websites and auto populate a page for submission. I have created this below with python below but I would like to avoid downloading anything extra onto our servers (ie. Python). Wondering if there is a library in powershell like selenium for python. Is there a way to find the xpath or name of buttons in IE like you do in chrome?
Python script below:
import time
from selenium import webdriver
#Go to website Site
driver = webdriver.Chrome("C:/WebDrivers/chromedriver.exe") # Optional argument, if not specified will search path.
driver.get('yourwebsite');
time.sleep(2) # Let page load!
#Log In with Credentials
search_box = driver.find_element_by_name("txtUserName")
search_box.send_keys('YourUsername')#Your Username
search_box1 = driver.find_element_by_name("txtPassword")
search_box1.send_keys('YourPassword')#Your Password
submit_button = driver.find_element_by_name('btnLogin')
submit_button.click()
time.sleep(10) # Let page load!

Selenium: difference between chrome and PhantomJS?-python

I want to do web scraping for Bing's search results. Basically, I am using selenium, the idea is to using selenium to click 'Next' automatedly and scrap the URLs of search results of each page. I made it run with chrome browser on my Ubuntu:
from selenium import web driver
import os
class bingURL(object):
def __init__(self):
self.driver=webdriver.Chrome(os.path.expanduser('./chromedriver'))
def get_urls(self,url):
driver=self.driver
driver.get(url)
elems = driver.find_elements_by_xpath("//a[#href]")
href=[]
for elem in elems:
link=elem.get_attribute("href")
try:
if 'bing.com' not in link and 'http' in link and 'microsoft.com' not in link and 'smashboards.com' not in link:
href.append(link)
except:
pass
return list(set(href))
def search_urls(self,keyword,pagenum):
driver=self.driver
searchurl=self.lookup(keyword) ### url of first page of google search
driver.get(searchurl)
results=self.get_urls(searchurl)
for i in range(pagenum):
driver.find_elements_by_class_name("sb_pagN")[0].click() # click 'Next' of bing search result
time.sleep(5) # wait to load page
current_url=driver.current_url
#print(current_url)
#print(self.get_urls(current_url))
results[0:0]=self.get_urls(current_url)
driver.quit()
return results
def lookup(self,query):
return "https://www.bing.com/search?q="+query
if __name__ == "__main__":
g=bingURL()
result=g.search_urls('Stackoverflow is good',10)
it works perfectly, when I run the code, it launches a Chrome browser, and I can saw it go to the next page automatically, and get URLs for 10 pages of searching results.
However, my goal is to run these codes on AWS successfully. The original codes failed with error 'Chrome failed to start'. After google, it seems I need to use a headless browser like PhantomJS on AWS. Thus I installed PhantomJS, and change the def __init__(self): to:
def __init__(self):
self.driver=webdriver.PhantomJS()
However, it cannot click 'next' anymore, and cannot scrap URLs using the old code. The error message is:
File ".../SEARCH_BING_MODULE.py", line 70, in search_urls
driver.find_elements_by_class_name("sb_pagN")[0].click()
IndexError: list index out of range
It looks like change the browser completely change the rules. How should I modify the more original code to make it work again? or how to scrap Bing search results' URLs using selenium+PhantomJS?
Thanks for your help!
Yes, You can perform all operations as per of your all 3 point using headless browser. Don't use HTMLUnit as it have many configuration issue.
PhamtomJS was another approach for headless browser but PhantomJs is having bug these days because of poorly maintenance of it.
You can use chromedriver itself for headless jobs.
You just need to pass one option in chromedriver as below:-
chromeOptions.addArguments("--headless");
Full code will appear like this :-
System.setProperty("webdriver.chrome.driver","D:\\Workspace\\JmeterWebdriverProject\\src\\lib\\chromedriver.exe");
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("--start-maximized");
WebDriver driver = new ChromeDriver(chromeOptions);
driver.get("https://www.google.co.in/");
Hope it will help you :)

chrome options in robot framework

I am trying to download a file from a link on webpage. However I get annoying warning "This type of file can harm...anyway? Keep, Discard". I tried several options to avoid the warning but still getting it. I am using robot framework however I am using python to create new keyword for me.
#keyword('open "${url}" in chrome browser')
def open_chrome_browser(self, url):
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--safebrowsing-disable-extension-blacklist")
options.add_argument("--safebrowsing-disable-download-protection")
prefs = {'safebrowsing.enabled': 'true'}
options.add_experimental_option("prefs", prefs)
self.open_browser(url, 'chrome',alias=None, remote_url=False, desired_capabilities=options.to_capabilities(), ff_profile_dir=None)
Can someone please suggest a way to disable the download warning?
I found an answer with some research. For some reason (may be a bug) open_browser does not set capabilities for chrome.
So, the alternative is to use 'create_webdriver'. Worked with following code:
#keyword('open "${url}" in chrome browser')
def open_chrome_browser(self, url):
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--safebrowsing-disable-extension-blacklist")
options.add_argument("--safebrowsing-disable-download-protection")
prefs = {'safebrowsing.enabled': 'true'}
options.add_experimental_option("prefs", prefs)
instance = self.create_webdriver('Chrome', desired_capabilities=options.to_capabilities())
self.go_to(url)
You need to add all the params in list. Then pass this list to Dictionary object and pass it to open browser.
Ex.
${list} = Create List --start-maximized --disable-web-security
${args} = Create Dictionary args=${list}
${desired caps} = Create Dictionary platform=${OS} chromeOptions=${args}
Open Browser https://www.google.com remote_url=${grid_url} browser=${BROWSER} desired_capabilities=${desired caps}
The below would be a simpler solution:
Open Browser ${URL} ${BROWSER} options=add_argument("--disable-notifications")
for multiple option, you can use with ; seperated.
options=add_argument("--disable-popup-blocking"); add_argument("--ignore-certificate-errors")
It is better not to disable any security features or any other defaults(unless there is ample justification) which comes with the browser "just to solve one problem", it would be better to find the solution by not touching it at all, and
just make use of requests modules in python and use the same as keyword, wherever you would want later on in all of your codebases. The reason for this approach is that it is better to get the job done making use of ubiquitous modules rather than spending time on one module for extensive amounts of time, I use to do that before, better install requests + robotframework-requests library and others to just get the job completed.
Just use the below code to create a keyword out of it and call it wherever you want, instead of going through the hassle of fixing browser behavior.
import requests
file_url = "http://www.africau.edu/images/default/sample.pdf"
r = requests.get(file_url, stream=True)
with open("sample.pdf", "wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
# writing one chunk at a time to pdf file
if chunk:
pdf.write(chunk)
This worked for me (must use SeleniumLibrary 4). Modify Chrome so it downloads PDFs instead of viewing them:
${chrome_options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
${disabled} Create List Chrome PDF Viewer PrintFileServer
${prefs} Create Dictionary download.prompt_for_download=${FALSE} plugins.always_open_pdf_externally=${TRUE} plugins.plugins_disabled=${disabled}
Call Method ${chrome_options} add_experimental_option prefs ${prefs}
${desired_caps}= Create Dictionary browserName=${browserName} version=${version} platform=${platform} screenResolution=${screenResolution} record_video=${record_video} record_network=${record_network} build=${buildNum} name=${globalTestName}
Open Browser url=${LOGINURL} remote_url=${remote_url} options=${chrome_options} desired_capabilities=${desired_caps}

How can I download a file on a click event using selenium?

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?
Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.
I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.
In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)
Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")

Categories