Download embedded PDF using Selenium/Python? - python

I've tried some of the solutions posted at this site but I still cannot make this thing work. I have to grab a PDF from a secured website. I'm able to get all the way to the page that has the button to create the PDF but I cannot find code that will let me then download the PDF. Here's what I got so far and any help is much appreciated!
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://service.medical.barco.com/server/jsp/login")
username = driver.find_element_by_name('j_username')
password = driver.find_element_by_name('j_password')
username.send_keys("XXX")
password.send_keys("XXX")
driver.find_element_by_css_selector('[value="Log on"]').click()
##makes the PDF and shows it in the Google PDF viewer
url = "https://service.medical.barco.com/server/spring/jsp/workstation/complianceCheckDetailReport/?displayId=932610524&date=1598328417477"
driver.get(url)
driver.find_element_by_class_name('href-button').click()
##This is probably unnecessary but I thought a direct link to the created PDF could give me a variable I could then download
pdf = "https://service.medical.barco.com/server/spring/jsp/workstation/complianceCheckDetailReport/jasper/report.pdf?format=pdf&displayId=932610524&date=1598328417477"
driver.get(pdf)

Once you get to the PDF, there are high chances that Chromium/Google Chrome will open the same in its PDF.js-based viewer. In order to get around this and 'download' the PDF, try passing an instance of ChromeOptions() with the following profile properties while creating a Chrome() instance as shown:
profile = {
'download.prompt_for_download': False,
'download.default_directory': '/path/to/download/the/pdf',
'download.directory_upgrade': True,
'plugins.always_open_pdf_externally': True,
}
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', profile)
driver = webdriver.Chrome(options=options)
On a side-note, you can always use the requests module.

Related

How can I save cookies to my Firefox Profile in Selenium Python and load them in again next session?

I have been using Selenium to log into a range of webpages on Firefox and search for certain keywords. Then I go to Youtube, watch a video, and close the driver after a few seconds. I was hoping to be able to observe the change in recommended videos over time due to the cookies kept in the Firefox profile, however it seems that my cookies are not being stored, I have tried the following (and many other methods) so far to start the driver:
ffOptions = Options()
ffOptions.headless = False
# ffOptions.add_argument('/Users/<username>/Library/Application Support/Firefox/Profiles/<new_profile>')
ffOptions.set_preference('profile', '/Users/<username>/Library/Application Support/Firefox/Profiles/<new_profile>')
driver = webdriver.Firefox(options = ffOptions)
driver.get(url)
and then later on added this to try and save the cookies to my Firefox profile:
current_cookies = driver.get_cookies()
for cookie in current_cookies:
print(cookie)
driver.add_cookie(cookie)
But this doesn't save cookies to cookies.sqlite in my profile. Is there a way to save these cookies so that I can load them in again on the next session?
Maybe this will help you:
I usualy inicialize the webdriver in the userdata path with an argument.
This will open the browser with all the user configurations, so your logins and cookies probably will be keept when reopen the driver.
In this case, im using Edge, but i think you can use other browsers, just find where its user data folder is.
# Set home and userdata path
from pathlib import Path
home = str(Path.home())
userdata = Path(f'{home}\\AppData\\Local\\Microsoft\\Edge\\User Data')
# add option to the driver
from selenium.webdriver.edge.options import Options as EdgeOptions
opt = EdgeOptions()
opt.add_argument(f'--user-data-dir={userdata}')

Cannot download pdf file using selenium on python [duplicate]

I am using selenium webdriver to automate downloading several PDF files. I get the PDF preview window (see below), and now I would like to download the file. How can I accomplish this using Google Chrome as the browser?
Try this code, it worked for me.
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "C:/Users/XXXX/Desktop", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
self.driver = webdriver.Chrome(options=options)
I found this piece of code somewhere on Stackoverflow itself and it serves the purpose for me without having to use selenium at all.
import urllib.request
response = urllib.request.urlopen(URL)
file = open("FILENAME.pdf", 'wb')
file.write(response.read())
file.close()
You can download the pdf (Embeded pdf & Normal pdf) from web using selenium.
from selenium import webdriver
download_dir = "C:\\Users\\omprakashpk\\Documents" # for linux/*nix, download_dir="/usr/Public"
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}], # Disable Chrome's PDF Viewer
"download.default_directory": download_dir , "download.extensions_to_open": "applications/pdf"}
options.add_experimental_option("prefs", profile)
driver = webdriver.Chrome('C:\\chromedriver\\chromedriver_2_32.exe', chrome_options=options) # Optional argument, if not specified will search path.
driver.get(`pdf_url`)
It will download and save the pdf in directory specified. Change the download_dir location and chrome driver location as per your convenience.
You can download chrome driver from here.
Hope it helps!
I did it and it worked, don't ask me how :)
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
#"download.default_directory": "C:/Users/517/Download", #Change default directory for downloads
#"download.prompt_for_download": False, #To auto download the file
#"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
driver = webdriver.Chrome(options=options)
You can download the PDF file using Python's requests library
import requests
pdf_url = driver.current_url # Get Current URL
response = requests.get(pdf_url)
file_name = 'filename.pdf'
with open(file_name, 'wb') as f:
f.write(response.content)
In My case it worked without any code modification,Just need to disabled the Chrome pdf viewer
Here are the steps to disable it
Go into Chrome Settings
Scroll to the bottom click on Advanced
Under Privacy And Security - Click on "Site Settings"
Scroll to PDF Documents
Enable "Download PDF files instead of automatically opening them in Chrome"

How to automate secure encrypted sites in Selenium using python?

I'm new to Python. I'm trying to do automation by opening a login page in Selenium.
from selenium import webdriver
browser = webdriver.Chrome(executable_path='chromedriver')
I tried to test some sites like - 'https://www.google.com/',etc. which is working perfectly fine.
url = 'https://www.google.com/'
browser.get(url)
I'm trying to open below url,
url = 'https://yesonline.yesbank.co.in/index.html?module=login'
browser.get(url)
I got the following error in selenium browser while the url is working fine without selenium.
Access Denied
You don't have permission to access
"http://yesonline.yesbank.co.in/index.html?" on this server.
Reference
#18.ef87d317.1625646692.41fe4bc0
But when I'm trying to just open the base url, it is opening but the site gets loads partially and keep showing loading.
url = 'https://yesonline.yesbank.co.in'
browser.get(url)
I feel like I am missing out something while opening the login url which I'm not able to get what exactly.
I also tried changing the webdriver i.e with Firefox.
url = 'https://yesonline.yesbank.co.in'
firefox_browser = webdriver.Firefox()
And guess what, it was opening!
But as soon as I'm trying to get the login page (even by manually using the mouse and clicking login page).
url = 'https://yesonline.yesbank.co.in/index.html?module=login'
firefox_browser.get(url)
'firefox_browser' is getting closed with an session reset error.
Can someone help me how to open secure sites in selenium. Or is there any other way to get it done.
It's finally working with chrome-driver by adding some arguments to it.
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
options.add_argument('--disable-blink-features=AutomationControlled')
browser = webdriver.Chrome(executable_path='chromedriver', options = options)

Blocking downloads on Chrome using Selenium with Python

I made a simple bot on python using selenium that researches a certain content in a website and returns a specific result to the user. However, that website has an ad that every time I click on a page element, or click ENTER to advance (or the bot clicks in this case) it downloads a file. I only found out about this after running a few tests while improving the bot. I tried doing it manually and the same thing happened, so it's a problem with the website itself.
But I'm guessing there's a way to completely block the downloads, because it's saving that file automatically. I don't think it makes that much difference, but this is what triggers the download:
driver.find_element(By.ID, "hero-search").send_keys(Keys.ENTER)
And I can't go around that because I need to advance to the next page. So, is there a way to block this on selenium?
You can block the downloads by using the chrome preferences:
from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {
"download.prompt_for_download", false,
"download_restrictions": 3,
}
options.add_experimental_option(
"prefs", prefs
)
driver = webdriver.Chrome(
options=options
)
driver.get(
"https://www.an_url.com"
)
driver.close()

logging into a website and downloading a file

I would like to log into a website and download a file. I'm using selenium and the chromedriver. Would like to know if there is a better way. It currently opens up a chrome browser window and sends the info. I don't want to see the browser window opened up and the data being sent. Just want to send it and return the data into a variable.
from selenium import webdriver
driver = webdriver.Chrome()
def site_login(URL,ID_username,ID_password,ID_submit,name,pas):
driver.get(URL)
driver.find_element_by_id(ID_username).send_keys(name)
driver.find_element_by_id(ID_password).send_keys(pas)
driver.find_element_by_id(ID_submit).click()
URL = "www.mywebsite.com/login"
ID_username = "name"
ID_password = "password"
ID_submit = "submit"
name = "myemail#mail.com"
pas = "mypassword"
resp=site_login(URL,ID_username,ID_password,ID_submit,name,pas)
You can run chrome in headless mode. In which case, the chrome UI won't show up and still performing the task you were doing. Some article I found on this https://intoli.com/blog/running-selenium-with-headless-chrome/. Hope this helps.
First option: If you are able to change the driver, you can use phantom-js as driver. That was a headless browser and you can use it with selenium.
Second option: If the site are not dynamic (easily called it SPA) or you are able to trace packet (which can be done in chrome dev tools), you can directly use request with the help of beautifulsoup if you need to get some data on the page.
Just add this two lines
chrome_options = Options()
chrome_options.add_argument("--headless")
This should make chrome run in the background.

Categories