ProfilesIni profile = new ProfilesIni();
FirefoxProfile ffprofile = profile.getProfile("default");//using firefox default profile
ffprofile.setPreference("permissions.default.image", 2); // this make ff to block web page images
WebDriver ff = new FirefoxDriver(ffprofile); // executing firefox with specified profile
ff.navigate().to("www.google.com"); // loading web page
//codes for changing image blocking ???????????
How can I change the image blocking after loading some web pages?
It is possible to modify preferences in flight via dev toolbar CLI but it may introduce higher overhead than loading images.
Here is Python example:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains, Keys
ff = webdriver.Firefox()
ff.get('http//<URL>')
ac = ActionChains(ff)
# SHIFT+F2 opens dev toolbar
ac.key_down(Keys.SHIFT).send_keys(Keys.F2).key_up(Keys.SHIFT).perform()
# command to disable images
ac.send_keys('pref set permissions.default.image 2').perform()
ac.send_keys(Keys.ENTER).perform()
# command to disable flash
ac.send_keys('pref set plugin.state.flash 0').perform()
ac.send_keys(Keys.ENTER).perform()
# disable dev toolbar
ac.key_down(Keys.SHIFT).send_keys(Keys.F2).key_up(Keys.SHIFT).perform()
ac.key_down(Keys.SHIFT).send_keys(Keys.F2).key_up(Keys.SHIFT).perform()
# reload the page to confirm there are no images or flash
ff.refresh()
Since the dev toolbar CLI does not work for me (since the key combination does not open the CLI) here is how I managed to change profile settings of a running firefox instance:
IWebDriver driver = //your instance
driver.Navigate().GoToUrl("about:config");
driver.FindElement(By.Id("warningButton")).Click();
IJavaScriptExecutor js = (IJavaScriptExecutor)driver;
js.ExecuteScript("window.Services.prefs.setIntPref('permissions.default.image', " + 2 + ")");
It's C# but the conversion in Java should not be too hard.
The idea is that the about:config tab declares a lot of Javascript object allowing to change the profile settings, so we just have to go on that page and to execute some JS code.
Run the firefox from command line firefox.exe -p
After that create a new profile, set the neccessery settings and always call this profile.
FirefoxProfile ffprofile = profile.getProfile("profileName");
Related
I am attempting to load a chrome browser with selenium using my existing account and settings from my profile.
I can get this working using ChromeOptions to set the userdatadir and profile directory. This loads the browser with my profile like i want, but the browser then hangs for 60 seconds and times out without advancing through any more of the automation.
If I don't use the user data dir and profile settings, it works fine but doesn't use my profile.
The reading I've done points to not being able to have more than one browser open at a time with the same profile so I made sure nothing was open while I ran the program. It still hangs for 60 seconds even without another browser open.
m_Options = new ChromeOptions();
m_Options.AddArgument("--user-data-dir=C:/Users/Me/AppData/Local/Google/Chrome/User Data");
m_Options.AddArgument("--profile-directory=Default");
m_Options.AddArgument("--disable-extensions");
m_Driver = new ChromeDriver(#"pathtoexe", m_Options);
m_Driver.Navigate().GoToUrl("somesite");
It always hangs on the GoToUrl. I'm not sure what else to try.
As per your code trials you were trying to load the Default Chrome Profile which will be against all the best practices as the Default Chrome Profile may contain either of the following:
Extensions
Bookmarks
Browsing History
etc
So the Default Chrome Profile may not be in compliance with you Test Specification and may raise exception while loading. Hence you should always use a customized Chrome Profile as below.
To create and open a new Chrome Profile you need to follow the following steps :
Open Chrome browser, click on the Side Menu and click on Settings on which the url chrome://settings/ opens up.
In People section, click on Manage other people on which a popup comes up.
Click on ADD PERSON, provide the person name, select an icon, keep the item Create a desktop shortcut for this user checked and click on ADD button.
Your new profile gets created.
Snapshot of a new profile SeLeNiUm
Now a desktop icon will be created as SeLeNiUm - Chrome
From the properties of the desktop icon SeLeNiUm - Chrome get the name of the profile directory. e.g. --profile-directory="Profile 2"
Get the absolute path of the profile-directory in your system as follows :
C:\\Users\\Thranor\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 2
Now pass the value of profile-directory through an instance of ChromeOptions with AddArgument method along with key user-data-dir as follows :
m_Options = new ChromeOptions();
m_Options.AddArgument("--user-data-dir=C:/Users/Me/AppData/Local/Google/Chrome/User Data/Profile 2");
m_Options.AddArgument("--disable-extensions");
m_Driver = new ChromeDriver(#"pathtoexe", m_Options);
m_Driver.Navigate().GoToUrl("somesite");
Execute your Test
Observe Chrome gets initialized with the Chrome Profile as SeLeNiUm
If you want to run Chrome using your default profile (cause you need a extension), you need to run your script using another browser, like Microsoft Edge or Microsoft IE and your code will lunch a Chrome instance.
My Code in PHP:
namespace Facebook\WebDriver;
use Facebook\WebDriver\Remote\DesiredCapabilities;
use Facebook\WebDriver\Remote\RemoteWebDriver;
use Facebook\WebDriver\Chrome\ChromeOptions;
require_once('vendor/autoload.php');
$host = 'http://localhost:4444/';
$options = new ChromeOptions();
$options->addArguments(array(
'--user-data-dir=C:\Users\paulo\AppData\Local\Google\Chrome\User Data',
'--profile-directory=Default',
'--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
));
$caps = DesiredCapabilities::chrome();
$caps->setCapability(ChromeOptions::CAPABILITY, $options);
$caps->setPlatform("Windows");
$driver = RemoteWebDriver::create($host, $caps);
$driver ->manage()->window()->maximize();
$driver->get('https://www.google.com/');
// your code goes here.
$driver->quit();
i guys, in my enviroment with chrome 63 and selenum for control, i have find same problem (60 second on wait for open webpage).
To fix i have find a way by setting a default webpage in chrome ./[user-data-dir]/[Profile]/Preferences file, this is a json data need to insert in "Preferences" file for obtain result
...
"session":{
"restore_on_startup":4,
"startup_urls":[
"http://localhost/test1"
]
}
...
For set "Preferences" from selenium i have use this sample code
ChromeOptions chromeOptions = new ChromeOptions();
//set my user data dir
chromeOptions.addArguments("--user-data-dir=/usr/chromeDataDir/");
//start create data structure to for insert json in "Preferences" file
Map<String, Object> prefs = new HashMap<String, Object>();
prefs.put("session.restore_on_startup", 4);
List<String> urlList = new ArrayList<String>();
urlList.add("http://localhost/test1");
prefs.put("session.startup_urls", urlList);
//set in chromeOptions data structure
chromeOptions.setExperimentalOption("prefs", prefs);
//start chrome
ChromeDriver chromeDriver = new ChromeDriver(chromeOptions);
//this get command for open web page, response instant
chromeDriver.get("http://localhost/test2")
i have find information here https://chromedriver.chromium.org/capabilities
I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?
I have been trying for so long I lost track of the many things I have done including:
Trying different versions of Chrome
Adding several flags and removing some words from the Chrome driver file.
Running it behind a proxy (residential ones also) using incognito mode.
Loading profiles.
Random mouse movements.
Randomising everything.
I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.
This is part of the starting of the browser:
sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)
display = Display(visible=0, size=(sx,sn))
display.start()
randagent = random.randint(0,len(useragents_desktop)-1)
uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False
chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")
wsize = "--window-size=" + str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)
The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.
However some generic approaches to avoid getting detected while web-scraping are as follows:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
#Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.
User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
A check for the presence of Chrome headless can be done through:
if (/HeadlessChrome/.test(window.navigator.userAgent)) {
console.log("Chrome headless detected");
}
Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.
A check for the presence of Plugins can be done through:
if(navigator.plugins.length == 0) {
console.log("It may be Chrome headless");
}
Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.
A check for the presence of Languages can be done through:
if(navigator.languages == "") {
console.log("Chrome headless detected");
}
WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.
A check for the presence of WebGL can be done through:
var canvas = document.createElement('canvas');
var gl = canvas.getContext('webgl');
var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
console.log("Chrome headless detected");
}
Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.
Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.
A check for the presence of hairline feature can be done through:
if(!Modernizr["hairline"]) {
console.log("It may be Chrome headless");
}
Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.
A check for the presence of Missing image can be done through:
var body = document.getElementsByTagName("body")[0];
var image = document.createElement("img");
image.src = "http://iloveponeydotcom32188.jg";
image.setAttribute("id", "fakeimage");
body.appendChild(image);
image.onerror = function(){
if(image.width == 0 && image.height == 0) {
console.log("Chrome headless detected");
}
}
References
You can find a couple of similar discussions in:
How to bypass Google captcha with Selenium and python?
How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
tl; dr
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha
why not try undetected-chromedriver?
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
Tested until current chrome beta versions
Works also on Brave Browser and many other Chromium based browsers
Python 3.6++
you can install it with: pip install undetected-chromedriver
There are important things you should be ware of:
Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is data:, including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
For Python with Chrome or Chromium-based browsers, there's Selenium-Profiles
It currently supports:
Overwrite device metrics with fake-profiles
Mobile and Desktop emulation
Undetected by Google, Cloudflare, ..
Modifying headers supported using Selenium-Interceptor
Touch Actions
proxies with authentication
making single POST, GET or other requests using driver.requests.fetch(url, options) (syntax)
Installation
pip install selenium-profiles
Example script
from selenium_profiles.driver import driver as mydriver
from selenium_profiles.profiles import profiles
mydriver = mydriver()
driver = mydriver.start(profiles.Windows()) # or .Android
# get url
driver.get('https://nowsecure.nl/#relax') # test undetectability
input("Press ENTER to exit: ")
driver.quit() # Execute on the End!
Notes:
The package is licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , means, in case you want to use it for something commercial, you need to ask the author first.
headless support currently isn't guaranteed, but you can use pyvirtualdisplay
What about:
import random
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\DusEck\\Desktop\\chromedriver.exe")
username = "username" # data_user
password = "password" # data_pass
driver.get("https://www.depop.com/login/") # get URL
driver.find_element_by_xpath('/html/body/div[1]/div/div[3]/div[2]/button[2]').click() # Accept cookies
split_char_pw = [] # Empty lists
split_char = []
n = 1 # Splitter
for index in range(0, len(username), n):
split_char.append(username[index: index + n])
for user_letter in split_char:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("username").send_keys(user_letter)
for index in range(0, len(password), n):
split_char.append(password[index: index + n])
for pw_letter in split_char_pw:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("password").send_keys(pw_letter)
i have a website that i want to login to and it should stay like that for multiple sessions
I tried pickle to save cookies once logged in and then load the cookies when running the script again
but this doesn't work the website logs me out
So i tried to set custom profile for firefox
but checking which profile its running on by adding code
# 2- get tmp file location
profiletmp = driver.firefox_profile.path
# but... the current profile is a copy of the original profile :/
print("running profile " + profiletmp)
shows me that i'm running in a tmp directory copied from the main profile mentioned /tmp/xxxxxxxx/webdriver-py-copy
so according to the answer here (Python / Selenium / Firefox: Can't start firefox with specified profile path) i tried copying the folder to main profile but still when loading browser all changes are lost (ex; i made a bookmark and logged into a site then copied the folder over to main profile replacing all the files but still when opening firefox all changes are lost)
i can only retain changes if i manually use Firefox profile and then login without geckodriver now when gecko driver loads the page i'm logged in
by adding custom profile location in chromedriver it retains data but i have different issues with chrome and i want to use firefox
I'm trying to make a Selenium program to automatically download and upload some files.
Note that I am not doing this for testing but for trying to automate some tasks.
So here's my set_preference for the Firefox profile
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/home/jj/web')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/json, text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream')
profile.set_preference("browser.helperApps.alwaysAsk.force", False);
Yet, I still see the dialog for download.
The Selenium firefox webdriver runs the firefox browser GUI. When a download is invoked firefox will present a popup asking if you want to view the file or save the file. As far as I can tell this is a property of the browser and there is no way to disable this using the firefox preferences or by setting the firefox profile variables. The only way I could avoid the firefox download popup was to use Mechanize along with Selenium. I used Selenium to obtain the download link and then passed this link to Mechanize to perform the actual download. Mechanize is not associated with a GUI implementation and therefore does not present user interface popups.
This clip is in Python and is part of a class that will perform the download action.
# These imports are required
from selenium import webdriver
import mechanize
import time
# Start the firefox browser using Selenium
self.driver = webdriver.Firefox()
# Load the download page using its URL.
self.driver.get(self.dnldPageWithKey)
time.sleep(3)
# Find the download link and click it
elem = self.driver.find_element_by_id("regular")
dnldlink = elem.get_attribute("href")
logfile.write("Download Link is: " + dnldlink)
pos = dnldlink.rfind("/")
dnldFilename = dnldlink[pos+1:]
dnldFilename = "/home/<mydir>/Downloads/" + dnldFilename
logfile.write("Download filename is: " + dnldFilename)
#### Now Using Mechanize ####
# Above, Selenium retrieved the download link. Because of Selenium's
# firefox download issue: it presents a download dialog that requires
# user input, Mechanize will be used to perform the download.
# Setup the mechanize browser. The browser does not get displayed.
# It is managed behind the scenes.
br = mechanize.Browser()
# Open the login page, the download requires a login
resp = br.open(webpage.loginPage)
# Select the form to use on this page. There is only one, it is the
# login form.
br.select_form(nr=0)
# Fill in the login form fields and submit the form.
br.form['login_username'] = theUsername
br.form['login_password'] = thePassword
br.submit()
# The page returned after the submit is a transition page with a link
# to the welcome page. In a user interactive session the browser would
# automtically switch us to the welcome page.
# The first link on the transition page will take us to the welcome page.
# This step may not be necessary, but it puts us where we should be after
# logging in.
br.follow_link(nr=0)
# Now download the file
br.retrieve(dnldlink, dnldFilename)
# After the download, close the Mechanize browser; we are done.
br.close()
This does work for me. I hope it helps. If there is an easier solution I would love to know it.
I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?
Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.
I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.
In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)
Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")