Selenium download file

Selenium download file - python

I'm trying to make a Selenium program to automatically download and upload some files.
Note that I am not doing this for testing but for trying to automate some tasks.
So here's my set_preference for the Firefox profile
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/home/jj/web')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/json, text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream')
profile.set_preference("browser.helperApps.alwaysAsk.force", False);
Yet, I still see the dialog for download.

The Selenium firefox webdriver runs the firefox browser GUI. When a download is invoked firefox will present a popup asking if you want to view the file or save the file. As far as I can tell this is a property of the browser and there is no way to disable this using the firefox preferences or by setting the firefox profile variables. The only way I could avoid the firefox download popup was to use Mechanize along with Selenium. I used Selenium to obtain the download link and then passed this link to Mechanize to perform the actual download. Mechanize is not associated with a GUI implementation and therefore does not present user interface popups.
This clip is in Python and is part of a class that will perform the download action.
# These imports are required
from selenium import webdriver
import mechanize
import time
# Start the firefox browser using Selenium
self.driver = webdriver.Firefox()
# Load the download page using its URL.
self.driver.get(self.dnldPageWithKey)
time.sleep(3)
# Find the download link and click it
elem = self.driver.find_element_by_id("regular")
dnldlink = elem.get_attribute("href")
logfile.write("Download Link is: " + dnldlink)
pos = dnldlink.rfind("/")
dnldFilename = dnldlink[pos+1:]
dnldFilename = "/home/<mydir>/Downloads/" + dnldFilename
logfile.write("Download filename is: " + dnldFilename)
#### Now Using Mechanize ####
# Above, Selenium retrieved the download link. Because of Selenium's
# firefox download issue: it presents a download dialog that requires
# user input, Mechanize will be used to perform the download.
# Setup the mechanize browser. The browser does not get displayed.
# It is managed behind the scenes.
br = mechanize.Browser()
# Open the login page, the download requires a login
resp = br.open(webpage.loginPage)
# Select the form to use on this page. There is only one, it is the
# login form.
br.select_form(nr=0)
# Fill in the login form fields and submit the form.
br.form['login_username'] = theUsername
br.form['login_password'] = thePassword
br.submit()
# The page returned after the submit is a transition page with a link
# to the welcome page. In a user interactive session the browser would
# automtically switch us to the welcome page.
# The first link on the transition page will take us to the welcome page.
# This step may not be necessary, but it puts us where we should be after
# logging in.
br.follow_link(nr=0)
# Now download the file
br.retrieve(dnldlink, dnldFilename)
# After the download, close the Mechanize browser; we are done.
br.close()
This does work for me. I hope it helps. If there is an easier solution I would love to know it.

Related

Automate the DHL CSV dump download using Python

I have a requirement of automating the mail sending process with the data from DHL. Currently what we are doing is:
We have a DHL account, someone has to manually login to the account , download the CSV dump which contains the order tracking details then upload it to the server, port the data from those and process it.
So I thought of automating the whole process so that it requires minimal manual intervention.
1) Is there anyway we can automate the download process from DHL?
Note: I'm using Python

I'd start by looking for something more convenient to access with code...
searching google for "dhl order tracking api" gives:
https://developer.dhl/api-catalog
as its first result, which looks useful and exposes quite a bit of functionality.
you then need to figure out how to make a "RESTful" request, which has answers here like Making a request to a RESTful API using python, and there are lots of tutorials on the internet if you search for things like "python tutorial rest client" which points to articles like this

You can use Selenium for Python. Selenium is a package that automates a browser session. you can simulate mouse clicks and other actions using Selenium.
To Install:
pip install selenium
You will also have to install the webdriver for the browser you prefer to use.
https://www.seleniumhq.org/projects/webdriver/
Make sure that the browser version that you are using is up to date.
Selenium Documentation: https://selenium-python.readthedocs.io/
Since you are dealing with passwords and sensitive data, I am not including the code.

Login and Download
You can automate download process using selenium. Below is the sample code to automate any login process and download items from a webpage. As the requirements are not specific I'm taking general use-case and explaining how to automate the login and download process using python.
# Libraries - selenium for scraping and time for delay
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
chromeOptions = webdriver.ChromeOptions()
prefs = {"download.default_directory" : "Path to the directory to store downloaded files"}
chromeOptions.add_experimental_option("prefs",prefs)
chromedriver = r"Path to the directory where chrome driver is stored"
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chromeOptions)
# To maximize the browser window
browser.maximize_window()
# web link for login page
browser.get('login page link')
time.sleep(3) # wait for the page to load
# Enter your user name and password here.
username = "YOUR USER NAME"
password = "YOUR PASSWORD"
# username send
# you can find xpath to the element in developer option of the chrome
# referance answer "[https://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome][1]"
a = browser.find_element_by_xpath("xpath to username text box") # find the xpath for username text box and replace inside the quotes
a.send_keys(username) # pass your username
# password send
b = browser.find_element_by_xpath("xpath to password text box") # find the xpath for password text box and replace inside the quotes
b.send_keys(password) # pass your password
# submit button clicked
browser.find_element_by_xpath("xpath to submit button").click() # find the xpath for submit or login button and replace inside the quotes
time.sleep(2) # wait for login to complete
print('Login Successful') # if there is no error you will see "Login Successful" message
# Navigate to the menu or any section using it's xpath and you can click using click() function
browser.find_element_by_xpath("x-path of the section/menu").click()
time.sleep(1)
# download file
browser.find_element_by_xpath("xpath of the download file button").click()
time.sleep(1)
# close browser window after successful completion of the process.
browser.close()
This way you can automate the login and the downloading process.
Mail automation
For Mail automation use smtplib module, explore this documentation "https://docs.python.org/3/library/smtplib.html"
Process automation (Scheduling)
To automate the whole process on an everyday basis create a cron job for both tasks. Please refer python-crontab module. Documentation: https://pypi.org/project/python-crontab/enter link description here
By using selenium, smtplib, and python-crontab you can automate your complete process with minimal or no manual intervention.

Python Selenium : Can't ignore Save Dialog while trying to download a file

My scripts run on Python 3.6, Selenium 2.48 and Firefox 41 (can't upgrade, I'm on a company)
I want to download some XML files from a website using Python and Selenium Webdriver. I use a Firefox profile to avoid the dialog frame and save the file in a specific location.
profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("browser.download.dir", dloadPath)
profile.set_preference("browser.helperApps.neverAsk.openFile","application/xml,text/xml")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/xml,text/xml")
browser = webdriver.Firefox(firefox_profile=profile)
The program finds all links downloadable (tested : works)
links = []
elements = browser.find_elements_by_xpath("//a[contains(#href,'reception/')]")
for elem in elements:
href = elem.get_attribute("href")
links.append(href)
return links
To download the file I use get() from Selenium
browser.get(fileUrl)
The files I'm looking for have a very specific url, means that I can't use Requests or urllib (2 or 3) and I need to login to the website and navigate througth it, can do It with those modules.
The url is like :
https://www.example.com/cft/cft/reception/filename.xml?user=xxxxxxxx&password=xxxxxxxx
Here is the html link :
filename.xml
With my script I can access to the website, navigate throught it but when I get the file url the dialog frame pops up, with no reasons that I found.
The script works very well on other websites, I think the problem is the url.
Thanks for your help

Python transfer mechanize browser session

I'm having a little difficulty trying to navigate a website past the login screen. I've done this using mechanize. However once I navigate past the login page I want to interact with the page, click attributes, etc. which mechanize cannot do. I also want to do this all "behind the curtain" so the browser window is invisible (trying not to use selenium).
Here is the code I use to login. What can I do past this to start interacting with the page
import mechanize
br = mechanize.Browser()
#get computer browser
br.set_handle_robots(False)
#what robots?
br.open("www.website.com")
#open website
br.select_form(nr=0)
#get the main form
br.set_all_readonly(False)
for control in br.form.controls:
print control
user_control = br.form.controls[0]
user_control._value = 'username'
user_password = br.form.controls[1]
user_password._value = 'password'
br.submit()

One option would be to "transfer" cookies from mechanize to selenium and use selenium with a headless browser like PhantomJS or with a virtual display. Or, just switch to selenium+PhantomJS completely (including the authentication step).
See also:
Python: how to dump cookies of a mechanize.Browser instance?
How to save and load cookies using python selenium webdriver

How can I download a file on a click event using selenium?

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?

Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.

I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.

In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)

Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")

Staying logged in with Selenium in Python

I am trying to log into a website and then once logged in navigate to a different page on the website remaining logged in, using Selenium. However, when I try to navigate to the different page, I found I have become logged off.
I believe this is because I do not understand how the webdriver.Firefox().get() function works exactly.
My code:
from selenium import webdriver
from Code.Other import XMLParser
#Initialise driver and go to webpage
driver = webdriver.Firefox()
URL = 'http://www.website.com'
driver.get(URL)
#Login
UserName = XMLParser.XMLParse('./Config.xml','UserName')
Password = XMLParser.XMLParse('./Config.xml','Password')
element = driver.find_elements_by_id('UserName')
element[0].send_keys(UserName)
element = driver.find_elements_by_id('Password')
element[0].send_keys(Password)
element = driver.find_elements_by_id('Submit')
element[0].click()
#Go to new page
URL = 'http://www.website.com/page1'
driver.get(URL)
Unfortunately I am navigated to the new page but I am no longer logged in. How do I fix this?

Looks like website doesn't have enough time to react on your submit in authorization form. You click submit but you don't wait for response and open another url.
Wait until some event after login (like getting cookies or some changes in DOM or just time.sleep) and only then go to another page.
P.S.: if it won't help, try to check your cookies after login and after you open new url, maybe it's problem with authorization backend or webdriver

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium download file - python

Related

Automate the DHL CSV dump download using Python

Python Selenium : Can't ignore Save Dialog while trying to download a file

Python transfer mechanize browser session

How can I download a file on a click event using selenium?

Staying logged in with Selenium in Python

Categories

Resources