Headless Python Selenium MacOS to Click/Download Documents via Chromium

Headless Python Selenium MacOS to Click/Download Documents via Chromium - python

There's a lot on this topic. However, I have found nothing workable so far that involves using what is said in the title above and configurations listed below.
Here is what I am attempting to do: go to this webpage and click on the csv document icon for download (via xpath or css selectors). Either icon is fine - they download the same content.
The sourcecode below outlines what I have done so far. This script runs with no issues, but no document is downloaded - how do I possibly resolve this issue?
Note the following parameters for OS, Python, ChromeDriver, and Chrome configurations:
macOS Mojave v.10.14.6, Python v.3.7.3, ChromeDriver v.770386540, Chrome v.770386540
from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {"download.default_directory": "SOME_PATH"}
options.add_experimental_option("prefs", prefs)
options.binary_location = 'PATH_TO_CHROME'
options.add_argument('headless')
# set the window size
options.add_argument('window-size=1200x600')
# initialize the driver
driver = webdriver.Chrome('PATH_TO_CHROME_DRIVER',
options=options)
page_url = 'http://webapps.rrc.texas.gov/eds/eds_searchUic.xhtml'
button = '//*[#id="SearchUicForm:searchTable_paginator_top"]/a[7]'
driver.get(page_url)
# wait up to 10 seconds for the elements to become available
driver.implicitly_wait(5)
driver.find_element_by_xpath(button).click()

You can comment this line of code options.add_argument('headless') and see what is happening in browser. It basically clicks the cvs icon and a download window pop up in browser so we need to handle this pop up window in order to download. We can add chrome options to prevent this.
options = Options()
options.add_experimental_option("prefs", {
"download.default_directory": r"C:\Users\xxx\downloads\Test",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
driver = webdriver.Chrome(chrome_options=options)

Related

Using Selenium with Python in Chrome to click "download" button and download PDF

I am trying to download a PDF from the following url,https://sec.report/Document/0001670254-20-001152/
There is a download button embedded in the html. I am using the following code to click the button and send the download to my desktop as defined in my path. The program runs without any errors but the PDF does not show up in the desktop. I have tried changing the location to different places, ie Downloads. I have also toggled the preferences in google chrome to download PDF files instead of automatically opening them in Chrome. Any ideas?
from selenium import webdriver
download_dir = "C:\\Users\\andrewlittle\\Desktop"
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
"download.default_directory": download_dir , "download.extensions_to_open": "applications/pdf"}
options.add_experimental_option("prefs", profile)
chromedriver_path = os.getcwd() + '/chromedriver'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://sec.report/Document/0001670254-20-001152/document_1.pdf')
driver.close()
Thanks in advance!

See the answer below:
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
download_dir = "/Users/test/Documents/"
options = Options()
options.add_experimental_option('prefs', {
"download.default_directory": download_dir,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
}
)
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)
driver.get('https://sec.report/Document/0001670254-20-001152/document_1.pdf')
time.sleep(3)
driver.quit()
I put the time.sleep in for some security in case the file takes a little longer to download. However, it is not necessary.
I also used the newer, Service and Options objects for Selenium.
The key to the code is the use of,
"download.default_directory": download_dir,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
These allow for Chrome to download the PDF without prompt to the directory of your choice.

Unclickeable PDF download page Chrome Selenium

I wrote the following procedure, based on Selenium and Chrome, to download a PDF file to a defined folder, after performing some actions on a web app:
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : path_to_destination,
"plugins.always_open_pdf_externally": True
# Additional options I've tried but didn't work
#,"download.prompt_for_download": False,
# 'profile.default_content_setting_values.automatic_downloads': 1,
# "helperApps.neverAsk.saveToDisk": mime_types,
# "plugin.disable_full_page_plugin_for_types": mime_types
}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(
executable_path=executable_path,
chrome_options=chrome_options
)
However, as soon as I click on the link that, normally, allows to visualize the pdf, the following unclickable page is displayed:
As soon as I manually click on it, everything works fine and the file is correctly downloaded to the indicated folder ("path_to_destination");
I tried with:
driver.find_element_by_xpath("//*[contains(#id, 'open-button')]").click()
# Or
driver.find_element_by_xpath("//*[contains(#id, 'main-content')]").click()
Since the xpath is:
//*[#id="main-content"]/a
But it does not work.
How can I either avoid opening this second page or clicking on the "Apri" (= Open) button?
P.S. Using Firefox and the following options, everything works fine:
# Setup
profile = webdriver.FirefoxProfile()
mime_types = "application/pdf,application/vnd.adobe.xfdf,"\
"application/vnd.fdf,application/vnd.adobe.xdp+xml"
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", full_destination)
# For PDFs
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
profile.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
profile.set_preference("pdfjs.disabled", True)

Try adding the following to prefs:
"download.prompt_for_download": False

selenium python generate data uri of an image with headless Chrome [duplicate]

I'm do me code in Cromedrive in 'normal' mode and works fine. When I change to headless mode it don't download the file. I already try the code I found alround internet, but didn't work.
chrome_options = Options()
chrome_options.add_argument("--headless")
self.driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'{}/chromedriver'.format(os.getcwd()))
self.driver.set_window_size(1024, 768)
self.driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': os.getcwd()}}
self.driver.execute("send_command", params)
Anyone have any idea about how solve this problem?
PS: I don't need to use Chomedrive necessarily. If it works in another drive it's fine for me.

First the solution
Minimum Prerequisites:
Selenium client version: Selenium v3.141.59
Chrome version: Chrome v77.0
ChromeDriver version: ChromeDriver v77.0
To download the file clicking on the element with text as Download Data within this website you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe', service_args=["--log-path=./Logs/DubiousDan.log"])
print ("Headless Chrome Initialized")
params = {'behavior': 'allow', 'downloadPath': r'C:\Users\Debanjan.B\Downloads'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
driver.get("https://www.mockaroo.com/")
driver.execute_script("scroll(0, 250)");
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#download"))).click()
print ("Download button clicked")
#driver.quit()
Console Output:
Headless Chrome Initialized
Download button clicked
File Downloading snapshot:
Details
Downloading files through Headless Chromium was one of the most sought functionality since Headless Chrome was introduced.
Since then there were different work-arounds published by different contributors and some of them are:
Downloading with chrome headless and selenium
Python equivalent of a given wget command
Now the, the good news is Chromium team have officially announced the arrival of the functionality Downloading file through Headless Chromium.
In the discussion Headless mode doesn't save file downloads #eseckler mentioned:
Downloads in headless work a little differently. There's the Page.setDownloadBehavior devtools command to set a download folder. We're working on a way to use DevTools network interception to stream the downloaded file via DevTools as well.
A detailed discussion can be found at Issue 696481: Headless mode doesn't save file downloads
Finally, #bugdroid revision seems to have nailed the issue for us.
[ChromeDriver] Added support for headless mode to download files
Previously, Chromedriver running in headless mode would not properly download files due to the fact it sparsely parses the preference file given to it. Engineers from the headless chrome team recommended using DevTools's "Page.setDownloadBehavior" to fix this. This changelist implements this fix. Downloaded files default to the current directory and can be set using download_dir when instantiating a chromedriver instance. Also added tests to ensure proper download functionality.
Here is the revision and commit
From ChromeDriver v77.0.3865.40 (2019-08-20) release notes:
Resolved issue 2454: Headless mode doesn't save file downloads [Pri-2]
Solution
Update ChromeDriver to latest ChromeDriver v77.0 level.
Update Chrome to Chrome Version 77.0 level. (as per ChromeDriver v76.0 release notes)
Note: Chrome v77.0 is yet to be GAed/pushed for release so till then you can download and install a development build and test either from:
Chrome Canary
Latest build from the Dev Channel
Outro
However Mac OSX users have a wait for their pie as On Chromedriver, headless chrome crashes after sending Page.setDownloadBehavior on MacOSX.

Chomedriver Version: 95.0.4638.54
Chrome Version 95.0.4638.69
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--start-maximized")
options.add_argument("--no-sandbox")
options.add_argument("--disable-extensions")
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--disable-gpu")
options.add_argument('--disable-software-rasterizer')
options.add_argument("user-agent=Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166")
options.add_argument("--disable-notifications")
options.add_experimental_option("prefs", {
"download.default_directory": "C:\\link\\to\\folder",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing_for_trusted_sources_enabled": False,
"safebrowsing.enabled": False
}
)
What seemed to work was that I used "\\" instead of "/" for the address. The latter approach didn't throw any error, but didn't download any documents either. But, using double back slashes did the job.

For javascript use below code:
const chrome = require('selenium-webdriver/chrome');
let options = new chrome.Options();
options.addArguments('--headless --window-size=1500,1200');
options.setUserPreferences({ 'plugins.always_open_pdf_externally': true,
"profile.default_content_settings.popups": 0,
"download.default_directory": Download_File_Path });
driver = await new webdriver.Builder().setChromeOptions(options).forBrowser('chrome').build();
Then switch tabs as soon as you click the download button:
await driver.sleep(1000);
var Handle = await driver.getAllWindowHandles();
await driver.switchTo().window(Handle[1]);

This C# works for me
Note the new headless option https://www.selenium.dev/blog/2023/headless-is-going-away/
private IWebDriver StartBrowserChromeHeadlessDriver()
{
var chromeOptions = new ChromeOptions();
chromeOptions.AddArgument("--headless=new");
chromeOptions.AddArgument("--window-size=1920,1080");
chromeOptions.AddUserProfilePreference("download.default_directory", downloadFolder);
var chromeDownload = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadFolder }
};
var driver = new ChromeDriver(driverFolder, chromeOptions, TimeSpan.FromSeconds(timeoutSecs));
driver.ExecuteCdpCommand("Browser.setDownloadBehavior", chromeDownload);
return driver;
}

import pathlib
from selenium.webdriver import Chrome
driver = Chrome()
driver.execute_cdp_cmd("Page.setDownloadBehavior", {
"behavior": "allow",
"downloadPath": str(pathlib.Path.home().joinpath("Downloads"))
})

I don't think you should be using the browser for downloading content, leave it to Chrome developers/testers.
I believe you should rather get href attribute of the element you want to download and obtain it using requests library
If your site requires authentication you could fetch cookies from the browser instance and pass them to requests.Session.

How can I translate the webpage opened via Selenium Webdriver to English using Python?

This is my code so far:
username_input = "username"
password_input = "password"
url='myurl'
browser = webdriver.Chrome(r'chromedriver.exe')
browser.get(url)
browser.maximize_window()
username = browser.find_element_by_id("j_username")
password = browser.find_element_by_id("j_password")
username.send_keys(str(username_input))
password.send_keys(str(password_input))
browser.find_element_by_xpath('//*[#id="inner-box"]/form/label[3]/input').click()
time.sleep(2)
Once I have logged in everything is in French but I need it in English.. how do I do this?
I have tried several things such as Chrome Options but didn't understand it/wasn't working.
Any help will be appreciated.

add prefs below to auto translate french to english
options = Options()
prefs = {
"translate_whitelists": {"fr":"en"},
"translate":{"enabled":"true"}
}
options.add_experimental_option("prefs", prefs)
browser = webdriver.Chrome(chrome_options=options)
you can remove r'chromedriver.exe' if the location is in same folder with your script.

The correct solution is:
from selenium import webdriver
chrome_path = "D:\chromedriver_win32\chromedriver"
custom_options = webdriver.ChromeOptions()
prefs = {
"translate_whitelists": {"ru":"en"},
"translate":{"enabled":"true"}
}
custom_options.add_experimental_option("prefs", prefs)
driver=webdriver.Chrome(chrome_path, options=custom_options)

I suppose you have to set up Chrome options like:
chrome_options = Options()
chrome_options.add_argument("--lang=en")

The change of webpage language is determined by the browser settings. I tried practically almost all the strategies discussed and mentioned in the forum, but none of them worked for me. I was able to successfully achieve it by following the instructions outlined below.
Create a new Chrome profile (i.e. Profile 2). Then move the new profile directory in the "Documents" directory of "Users"
Now open Google Chrome (from new profile), "run as administrator" mode > open www.google.com > at the bottom of the page, click on "setting" > now click on "search setting" > select the "region setting" > select "United Kingdom" for opening the webpage only in English language.
Now follow the following java code snippet.
System.setProperty("webdriver.chrome.driver", "C:\\Testing Work Space\\chromedriver.exe");
// Chrome actual new profile path is "C:\Users\shah\Documents\Profile 2\"
// but you have to keep the chromeProfilePath till "\Documents\" as follows
String chromeProfilePath = "C:\\Users\\shah\\Documents\\";
ChromeOptions chroOption = new ChromeOptions();
chroOption.addArguments("user-data-dir=" + chromeProfilePath);
// Here you specify the new Chrome profile folder (Profile 2)
chroOption.addArguments("profile-directory=Profile 2");
WebDriver driver = new ChromeDriver(chroOption);
driver.get("https://facebook.com");

All other answers not working for me. And I found bruteforce solution:
import pyautogui
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
url = 'https://ifconfig.me/'
options = Options()
options.add_argument('--lang=fr') # set your language here
browser = webdriver.Chrome(options=options)
browser.get(url)
actionChains = ActionChains(browser)
actionChains.context_click().perform()
# here maybe problem. Debug it:
for i in range(3):
pyautogui.sleep(1)
pyautogui.press('up')
pyautogui.press('enter')

As of September 2022, It seems like setting prefs doesn't work for chrome version 105. In order to solve this, you can either downgrade your chrome to version 95 or use selenium standalone docker. For the docker approach, You should pull and run standalone docker using:
docker pull selenium/standalone-chrome:95.0-chromedriver-95.0-20211102
docker run -d -p 4444:4444 --shm-size="2g" selenium/standalone-chrome:95.0-chromedriver-95.0-20211102
Then in your code use remoteDriver and apply prefs like this:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--lang=en')
prefs = {
"translate_whitelists": {"es": "en"},
"translate": {"enabled": "true"}
}
options.add_experimental_option("prefs", prefs)
driver = webdriver.Remote(command_executor='http://localhost:4444/wd/hub', options=options)
driver.get("https://www.amazon.es/")

Unable to download a file through requests after getting url from selenium webdriver

While working with selenium webdriver, I want to set download location to a particular location and work with the headless browser. But I am unable to do both at once. Upon going headless, download location changes back.
Here is the piece of my code:
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs",{
"download.default_directory":os.getcwd()+"\mydir",
"download.prompt_for_download":False,
"download.directory_upgrade": True
})
options.add_argument('--headless')
driver = webdriver.Chrome(chrome_options=options)

Unfortunately, chromedriver does not currently support headless downloads.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Headless Python Selenium MacOS to Click/Download Documents via Chromium - python

Related

Using Selenium with Python in Chrome to click "download" button and download PDF

Unclickeable PDF download page Chrome Selenium

selenium python generate data uri of an image with headless Chrome [duplicate]

How can I translate the webpage opened via Selenium Webdriver to English using Python?

Unable to download a file through requests after getting url from selenium webdriver

Categories

Resources