Firefox preference update is not applied with robot framework - python

I am trying to turn of Firefox download dialog. I used this piece of python code that use selenium library. This should make that file is directly download into entered path without additional asking.
from selenium import webdriver
def disable_download_dialog(path):
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", path)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
fp.update_preferences()
return fp.path
Then I call this function in my RF test like this:
${ff_profile_path}= disable download dialog ${EXECDIR}\\path\\to\\my\\folder
and then Open browser like this:
Open Browser ${url} ${browser} ff_profile_dir=${ff_profile_path}
From the test run I can see that download window is still displayed. The path to my folder, where I want to send downloaded file is displayed in test logs like this:
D:\\path\\to\\the\\folder\\named\\Downloads
And the firefox profile is really updated and saved in Temp file. But it looks like it's not loaded and therefore used for my test. The path to the firefox profile is like this:
C:\Users\surname~1.name\AppData\Local\Temp\tmp83d29mnz
ofc it's everytime a new profile created, what is not an issue. Maybe it could be great if I can also set the path for this firefox profile I created with python function.
So the question(s) here are:
Why the download dialog is still show when I disabled it?
Can be firefox profile saved in the folder that is defined by me?

Ok, so I found out, what was the missing piece.
I added these two lines of code into the python function
fp.set_preference("browser.helperApps.alwaysAsk.force", False)
fp.set_preference("pdfjs.disabled", True)
So the final version of the function looks like this:
def disable_download_dialog(path):
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", path)
fp.set_preference("browser.helperApps.alwaysAsk.force", False)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk",'application/pdf')
fp.set_preference("pdfjs.disabled", True)
fp.update_preferences()
return fp.path

Related

Python Webscraper Download PDF in Firefox

I am programming a Python Webscraper which needs to be able to click on a download button and save a PDF to a location that is defined through an XML-File.
The problematic part of my code is the following:
profile = webdriver.FirefoxProfile()
download_Path = items.get(key = 'dir') # Get download path from XML.
if not os.path.exists(download_Path):
os.makedirs(download_Path)
profile.set_preference("browser.helperApps.alwaysAsk.force", False)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("browser.download.manager.useWindow", False)
profile.set_preference("webdriver_enable_native_events", False)
profile.set_preference("browser.helperApps.neverAsk.openFile", "application/pdf;")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf;")
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.dir", download_Path)
profile.update_preferences()
driver = webdriver.Firefox(executable_path = DriverPath, options = options, firefox_profile = profile)
Almost everything works fine, the download directory gets changed in the intended way, so the profile.set_preferences works, but the other preferences don't change. I'm searching for a while now and as you can see I tried different options so that the browser doesn't ask to open the file or where to save it, and just moves it in the given directory.
I solved it myself. The answere is, that you have to configure the PDF-Reader that is intergrated in Firefox ("PDF.js") separtly with the following code:
profile.set_preferences("pdfjs.disable", True)
That's it the rest functions as intended.

Why does Selenium still ask me to configure Saves when I have it set in Python already?

I'm not really a Python user, but I'm using some code that I got online to download a file. One of the code is:
urlpage = 'https://www150.statcan.gc.ca/n1/tbl/csv/' + '10100127' + '-eng.zip'
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", 'D:\downloads')
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/x-gzip")
driver = webdriver.Firefox()
driver.get(urlpage)
Which from what I can see, should just download the file to my D: drive in the downloads folder, yet when I run the code, the webpage opens and then asks me if I would like to either view or download the file. Is there anything wrong with the code? or am I doing something wrong?
Not sure if it's important information, but I'm using PyCharm as my IDE
Here is the script that you should use, this will save the file in system default downloads folder.
FF_options = webdriver.FirefoxProfile()
FF_options.set_preference("browser.helperApps.neverAsk.saveToDisk","application/zip")
driver= webdriver.Firefox(firefox_profile=FF_options)
If you want to save the downloaded file in specific location then add the below prefs.
# change the path here, current line will save in the working directory meaning
# the location where your script is.
FF_options.set_preference("browser.download.dir", os.getcwd())
FF_options.set_preference("browser.download.folderList",2)

How to download file using remote Firefox webdriver?

I've tried to adapt several existing solutions (1, 2) to the remote Firefox webdriver running in a selenium/standalone-firefox Docker container:
options = Options()
options.set_preference('browser.download.dir', '/src/app/output')
options.set_preference('browser.download.folderList', 2)
options.set_preference('browser.download.manager.showWhenStarting', False)
options.set_preference('browser.helperApps.alwaysAsk.force', False)
options.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf')
options.set_preference('pdfjs.disabled', True)
options.set_preference('pdfjs.enabledCache.state', False)
options.set_preference('plugin.disable_full_page_plugin_for_types', False)
cls.driver = webdriver.Remote(
command_executor='http://selenium:4444/wd/hub',
desired_capabilities={'browserName': 'firefox', 'acceptInsecureCerts': True},
options=options
)
Navigating and clicking the relevant download button works fine, but the file never appears in the download directory. I've verified everything I can think of:
The user in the Selenium container can create files in /src/app/output and those files are visible in the host OS.
I can download the file successfully using my desktop browser.
The response content type is application/pdf.
What am I missing?
It turned out other changes done while researching this were resulting in the server returning a text/plain document rather than a PDF file. For reference, this is the simplest set of options I could get to work:
options.set_preference('browser.download.dir', DOWNLOAD_DIRECTORY)
options.set_preference('browser.download.folderList', 2)
options.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf')
options.set_preference('pdfjs.disabled', True)

Python Selenium: Firefox neverAsk.saveToDisk when downloading from Blob URL

I wish to have Firefox using selenium for Python to download the Master data (Download, XLSX) Excel file from this Frankfurt stock exchange webpage.
The problem: I can't get Firefox to download the file without asking where to save it first.
Let me first point out that the URL I'm trying to get the Excel file from, is really a Blob URL:
http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx
Perhaps the Blob is causing my problem? Or, perhaps the problem is in my MIME handling?
from selenium import webdriver
profile_dir = "path/to/ff_profile"
dl_dir = "path/to/dl/folder"
ff_profile = webdriver.FirefoxProfile(profile_dir)
ff_profile.set_preference("browser.download.folderList", 2)
ff_profile.set_preference("browser.download.manager.showWhenStarting", False)
ff_profile.set_preference("browser.download.dir", dl_dir)
ff_profile.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
driver = webdriver.Firefox(ff_profile)
url = "http://www.xetra.com/xetra-en/instruments/etf-exchange-traded-funds/list-of-tradable-etfs"
driver.get(url)
dl_link = driver.find_element_by_partial_link_text("Master data")
dl_link.click()
The actual mime-type to be used in this case is:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
How do I know that? Here is what I've done:
opened Firefox manually and navigated to the target site
when downloading the file, checked the checkbox to save these kind of files automatically
went to Help -> Troubleshooting Information and navigated to the "Profile Folder"
in the profile folder, foudn and opened mimetypes.rdf
inside the mimetypes.rdf found the record/resource corresponding to the excel file I've recently downloaded

How can I download a file on a click event using selenium?

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?
Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.
I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.
In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)
Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")

Categories