chrome options in robot framework - python

I am trying to download a file from a link on webpage. However I get annoying warning "This type of file can harm...anyway? Keep, Discard". I tried several options to avoid the warning but still getting it. I am using robot framework however I am using python to create new keyword for me.
#keyword('open "${url}" in chrome browser')
def open_chrome_browser(self, url):
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--safebrowsing-disable-extension-blacklist")
options.add_argument("--safebrowsing-disable-download-protection")
prefs = {'safebrowsing.enabled': 'true'}
options.add_experimental_option("prefs", prefs)
self.open_browser(url, 'chrome',alias=None, remote_url=False, desired_capabilities=options.to_capabilities(), ff_profile_dir=None)
Can someone please suggest a way to disable the download warning?

I found an answer with some research. For some reason (may be a bug) open_browser does not set capabilities for chrome.
So, the alternative is to use 'create_webdriver'. Worked with following code:
#keyword('open "${url}" in chrome browser')
def open_chrome_browser(self, url):
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--safebrowsing-disable-extension-blacklist")
options.add_argument("--safebrowsing-disable-download-protection")
prefs = {'safebrowsing.enabled': 'true'}
options.add_experimental_option("prefs", prefs)
instance = self.create_webdriver('Chrome', desired_capabilities=options.to_capabilities())
self.go_to(url)

You need to add all the params in list. Then pass this list to Dictionary object and pass it to open browser.
Ex.
${list} = Create List --start-maximized --disable-web-security
${args} = Create Dictionary args=${list}
${desired caps} = Create Dictionary platform=${OS} chromeOptions=${args}
Open Browser https://www.google.com remote_url=${grid_url} browser=${BROWSER} desired_capabilities=${desired caps}

The below would be a simpler solution:
Open Browser ${URL} ${BROWSER} options=add_argument("--disable-notifications")
for multiple option, you can use with ; seperated.
options=add_argument("--disable-popup-blocking"); add_argument("--ignore-certificate-errors")

It is better not to disable any security features or any other defaults(unless there is ample justification) which comes with the browser "just to solve one problem", it would be better to find the solution by not touching it at all, and
just make use of requests modules in python and use the same as keyword, wherever you would want later on in all of your codebases. The reason for this approach is that it is better to get the job done making use of ubiquitous modules rather than spending time on one module for extensive amounts of time, I use to do that before, better install requests + robotframework-requests library and others to just get the job completed.
Just use the below code to create a keyword out of it and call it wherever you want, instead of going through the hassle of fixing browser behavior.
import requests
file_url = "http://www.africau.edu/images/default/sample.pdf"
r = requests.get(file_url, stream=True)
with open("sample.pdf", "wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
# writing one chunk at a time to pdf file
if chunk:
pdf.write(chunk)

This worked for me (must use SeleniumLibrary 4). Modify Chrome so it downloads PDFs instead of viewing them:
${chrome_options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
${disabled} Create List Chrome PDF Viewer PrintFileServer
${prefs} Create Dictionary download.prompt_for_download=${FALSE} plugins.always_open_pdf_externally=${TRUE} plugins.plugins_disabled=${disabled}
Call Method ${chrome_options} add_experimental_option prefs ${prefs}
${desired_caps}= Create Dictionary browserName=${browserName} version=${version} platform=${platform} screenResolution=${screenResolution} record_video=${record_video} record_network=${record_network} build=${buildNum} name=${globalTestName}
Open Browser url=${LOGINURL} remote_url=${remote_url} options=${chrome_options} desired_capabilities=${desired_caps}

Related

How to login to website which is detecting bot usage using Selenium [duplicate]

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?
I have been trying for so long I lost track of the many things I have done including:
Trying different versions of Chrome
Adding several flags and removing some words from the Chrome driver file.
Running it behind a proxy (residential ones also) using incognito mode.
Loading profiles.
Random mouse movements.
Randomising everything.
I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.
This is part of the starting of the browser:
sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)
display = Display(visible=0, size=(sx,sn))
display.start()
randagent = random.randint(0,len(useragents_desktop)-1)
uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False
chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")
wsize = "--window-size=" + str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)
The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.
However some generic approaches to avoid getting detected while web-scraping are as follows:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
#Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.
User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
A check for the presence of Chrome headless can be done through:
if (/HeadlessChrome/.test(window.navigator.userAgent)) {
console.log("Chrome headless detected");
}
Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.
A check for the presence of Plugins can be done through:
if(navigator.plugins.length == 0) {
console.log("It may be Chrome headless");
}
Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.
A check for the presence of Languages can be done through:
if(navigator.languages == "") {
console.log("Chrome headless detected");
}
WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.
A check for the presence of WebGL can be done through:
var canvas = document.createElement('canvas');
var gl = canvas.getContext('webgl');
var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
console.log("Chrome headless detected");
}
Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.
Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.
A check for the presence of hairline feature can be done through:
if(!Modernizr["hairline"]) {
console.log("It may be Chrome headless");
}
Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.
A check for the presence of Missing image can be done through:
var body = document.getElementsByTagName("body")[0];
var image = document.createElement("img");
image.src = "http://iloveponeydotcom32188.jg";
image.setAttribute("id", "fakeimage");
body.appendChild(image);
image.onerror = function(){
if(image.width == 0 && image.height == 0) {
console.log("Chrome headless detected");
}
}
References
You can find a couple of similar discussions in:
How to bypass Google captcha with Selenium and python?
How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
tl; dr
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha
why not try undetected-chromedriver?
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
Tested until current chrome beta versions
Works also on Brave Browser and many other Chromium based browsers
Python 3.6++
you can install it with: pip install undetected-chromedriver
There are important things you should be ware of:
Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is data:, including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
For Python with Chrome or Chromium-based browsers, there's Selenium-Profiles
It currently supports:
Overwrite device metrics with fake-profiles
Mobile and Desktop emulation
Undetected by Google, Cloudflare, ..
Modifying headers supported using Selenium-Interceptor
Touch Actions
proxies with authentication
making single POST, GET or other requests using driver.requests.fetch(url, options) (syntax)
Installation
pip install selenium-profiles
Example script
from selenium_profiles.driver import driver as mydriver
from selenium_profiles.profiles import profiles
mydriver = mydriver()
driver = mydriver.start(profiles.Windows()) # or .Android
# get url
driver.get('https://nowsecure.nl/#relax') # test undetectability
input("Press ENTER to exit: ")
driver.quit() # Execute on the End!
Notes:
The package is licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , means, in case you want to use it for something commercial, you need to ask the author first.
headless support currently isn't guaranteed, but you can use pyvirtualdisplay
What about:
import random
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\DusEck\\Desktop\\chromedriver.exe")
username = "username" # data_user
password = "password" # data_pass
driver.get("https://www.depop.com/login/") # get URL
driver.find_element_by_xpath('/html/body/div[1]/div/div[3]/div[2]/button[2]').click() # Accept cookies
split_char_pw = [] # Empty lists
split_char = []
n = 1 # Splitter
for index in range(0, len(username), n):
split_char.append(username[index: index + n])
for user_letter in split_char:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("username").send_keys(user_letter)
for index in range(0, len(password), n):
split_char.append(password[index: index + n])
for pw_letter in split_char_pw:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("password").send_keys(pw_letter)

How to use Crawlera with selenium (Python, Chrome, Windows) without Polipo

So basically i am trying to use the Crawlera Proxy from scrapinghub with selenium chrome on windows using python.
I checked the documentation and they suggested using Polipo like this:
1) adding the following lines to /etc/polipo/config
parentProxy = "proxy.crawlera.com:8010"
parentAuthCredentials = "<CRAWLERA_APIKEY>:"
2) adding this to selenium driver
polipo_proxy = "127.0.0.1:8123"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': polipo_proxy,
'ftpProxy' : polipo_proxy,
'sslProxy' : polipo_proxy,
'noProxy' : ''
})
capabilities = dict(DesiredCapabilities.CHROME)
proxy.add_to_capabilities(capabilities)
driver = webdriver.Chrome(desired_capabilities=capabilities)
Now i'd like to not use Polipo and directly use the proxy.
Is there a way to replace the polipo_proxy variable and change it to the crawlera one? Each time i try to do it, it doesn't take it into account and runs without proxy.
Crawlera proxy format is like the folowwing: [API KEY]:#[HOST]:[PORT]
I tried adding the proxy using the following line:
chrome_options.add_argument('--proxy-server=http://[API KEY]:#[HOST]:[PORT])
but the problem is that i need to specify HTTP and HTTPS differently.
Thank you in advance!
Polipo is no longer maintained and hence there are challenges in using it. Crawlera requires Authentication, which Chrome driver does not seem to support as of now. You can try using Firefox webdriver, in that you can set the proxy authentication in the custom Firefox profile and use the profile as shown in Running selenium behind a proxy server and http://toolsqa.com/selenium-webdriver/http-proxy-authentication/.
I have been suffering from the same problem and got some relief out of it. Hope it will help you as well. To solve this problem you have to use Firefox driver and its profile to put proxy information this way.
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", "proxy.server.address")
profile.set_preference("network.proxy.http_port", "port_number")
profile.update_preferences()
driver = webdriver.Firefox(firefox_profile=profile)
This totally worked for me. For reference you can use above sites.
Scrapinghub creates a new project. You need to set up a forwarding agent by using apikey, and then set webdriver to use this agent. The project address is: zyte-smartproxy-headless-proxy
You can have a look

Cannot use chrome cast button with selenium

I am trying to use Selenium to cast youtube videos to my chromecast. When I open youtube in chrome normally I see the cast button and it works fine. When I open it with Selenium the cast button is missing, and when I select Cast from menu it gives me the error "No Cast destinations found. Need help?"
I am using python, and have tried lots of combinations of flags with webdriver. Here is what I have
options = webdriver.ChromeOptions()
options.add_argument('--user-data-dir=./ChromeProfile')
options.add_argument('--disable-session-crashed-bubble')
options.add_argument('--disable-save-password-bubble')
options.add_argument('--disable-permissions-bubbles')
options.add_argument('--bwsi')
options.add_argument('--load-media-router-component-extension')
options.add_argument('--enable-video-player-chromecast-support');
excludeList = ['disable-component-update',
'ignore-certificate-errors',
]
options.add_experimental_option('excludeSwitches', excludeList)
chromedriverPath = '/my/path/to/chromedriver'
driver = webdriver.Chrome(chromedriverPath, chrome_options=options)
path = 'https://www.youtube.com/watch?v=Bz9Lza059NU'
driver.get(path);
time.sleep(60) # Let the user actually see something!
driver.quit()
I figured out how to get it working. It seemed to require two steps. Copying my default profile over to somewhere selenium could use it, and figuring out the correct flags to use when opening chrome. The key being selenium automatically added a bunch of flags that I didn't want, so I had to exclude one.
First to find out where my profile is stored, I opened up chrome to this url chrome://version/.
This gave me lots of information, but the important ones were
Command Line: /usr/lib/chromium-browser/chromium-browser --enable-pinch --flag-switches-begin --flag-switches-end
Profile Path: /home/mdorrell/.config/chromium/Default
First I copied my profile to some directory that Selenium could use
cp -R /home/mdorrell/.config/chromium/Default/* /home/mdorrell/ChromeProfile
Then I opened this same page in the browser opened by selenium and got the list of flags that selenium added from the Command Line row. The one that ended up giving me the problems was --disable-default-apps
In the end the code that I needed to add ended up looking like this
options = webdriver.ChromeOptions()
# Set the user data directory
options.add_argument('--user-data-dir=/home/mdorrell/ChromeProfile')
# get list of flags selenium adds that we want to exclude
excludeList = [
'disable-default-apps',
]
options.add_experimental_option('excludeSwitches', excludeList)
chromedriverPath = '/my/path/to/chromedriver'
driver = webdriver.Chrome(chromedriverPath, chrome_options=options)
path = 'https://www.youtube.com/watch?v=Bz9Lza059NU'
driver.get(path);
time.sleep(60) # Let the user actually see something!
driver.quit()
Thanks #MikeD for sharing your answer.
I was having the same issue when I wanted to chrome cast a R Shiny Dashboard via a selenium browser (with RSelenium). If I'd click on Cast it would show me "No Cast destinations found. Need help?", whereas from a normal browser it works fine.
In my case, it worked after excluding two switches (including the ChromeProfile was not necessary), which in R can be done with:
library(RSelenium)
options <- list()
options$chromeOptions$excludeSwitches <- list('disable-background-networking',
'disable-default-apps')
rD <- rsDriver(verbose = FALSE, port = 4570L, extraCapabilities = options)

Python web scraping gives wrong source code

I want to extract some data from Amazon(link in the following code)
Here is my code:
import urllib2
url="http://www.amazon.com/s/ref=sr_nr_n_11?rh=n%3A283155%2Cn%3A%2144258011%2Cn%3A2205237011%2Cp_n_feature_browse-bin%3A2656020011%2Cn%3A173507&bbn=2205237011&sort=titlerank&ie=UTF8&qid=1393984161&rnid=1000"
webpage=urllib2.urlopen(url).read()
doc=open("test.html","w")
doc.write(webpage)
doc.close()
When I open the test.html, the content of my page is different from the website in the Internet.
The page involves javascript execution.
urllib2.urlopen(..).read() simply read the url content. So they are different.
To get same content, you need to use library that can handle javascript.
For example, following code uses selenium:
from selenium import webdriver
url = 'http://www.amazon.com/s/ref=sr_nr_n_11?...161&rnid=1000'
driver = webdriver.Firefox()
driver.get(url)
with open('test.html', 'w') as f:
f.write(driver.page_source.encode('utf-8'))
driver.quit()
To complete falsetru's answer:
another solution is to use python-ghost. It is based on Qt. It's much heavier to install, so I advice Selenium too.
Using Firefox will open it up on script execution. To not have it on your way, use PhantomJS:
apt-get install nodejs # you get npm, the Node Package Manager
npm install -g phantomjs # install globally
[…]
driver = webdriver.PhantomJS()

How can I download a file on a click event using selenium?

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?
Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.
I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.
In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)
Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")

Categories