Download file from linked HTML ref, use in Selenium python script

Download file from linked HTML ref, use in Selenium python script - python

I am trying to create an automation process for downloading updated versions of VS Code Marketplace extensions, and have a selenium python script that takes in a list of extension hosting pages and names, navigates to the extension page, clicks on version history tab, and clicks the top (most-recent) download link. I change the driver's chrome options to edit chrome's default download directory to a created folder under that extension's name. (ex. download process from marketplace)
This all works well, but is extremely time consuming because a new window needs to be opened upon each iteration with a different extension as the driver settings have to be reset to change the chrome download location. Furthermore, selenium guidance recommends against download clicks and to rather capture URL and translate to an HTTP request library.
To solve this, I am trying to use urllib download from an http link and download to a specified path- this could then let me get around needing to reset the driver settings upon every iteration, which would then allow me to run the driver in a single window and just open new tabs to save overall time. urllib documentation
However, when I inspect the download button on an extension, the only link I can find is the href link which has a format like:
https://marketplace.visualstudio.com/_apis/public/gallery/publishers/grimmer/vsextensions/vscode-back-forward-button/0.1.6/vspackage (raw html)
In examples in the documentation the links have a format like:
https://www.facebook.com/favicon.ico
with the filename on the end.
I have tried multiple functions from urllib to download from that href link, but it doesn't seem to recognize it, so I'm not sure if there's any way to get a link that looks like the format from the documention, or some other solution?
Also, urllib seems to require the file name (i.e. extensionversionnumber.vsix) at the end of the path to download to a specified location, but I can't seem to pull the file name from the html etiher.
import os
from struct import pack
import time
import pandas as pd
import urllib.request
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
inputLocation=input("Enter csv file path: ")
fileLocation=os.path.abspath(inputLocation)
inputPath=input("Enter path to where packages will be stored: ")
workingPath=os.path.abspath(inputPath)
df=pd.read_csv(fileLocation)
hostingPages=df['Hosting Page'].tolist()
packageNames=df['Package Name'].tolist()
chrome_options = webdriver.ChromeOptions()
def downloadExtension(url, folderName):
os.chdir(workingPath)
if not os.path.exists(folderName):
os.makedirs(folderName)
filepath=os.path.join(workingPath, folderName)
chrome_options.add_experimental_option("prefs", {
"download.default_directory": filepath,
"download.prompt_for_download": False,
"download.directory_upgrade": True
})
driver=webdriver.Chrome(options=chrome_options)
wait=WebDriverWait(driver, 20)
driver.get(url)
wait.until(lambda d: d.find_element(By.ID, "versionHistory"))
driver.find_element(By.ID, "versionHistory").click()
wait.until(lambda d: d.find_element(By.LINK_TEXT, "Download"))
#### attempt to use urllib to download by html request rather than click ####
link=driver.find_element(By.LINK_TEXT, "Download").get_attribute('href')
urllib.request.urlretrieve(link, filepath)
#### above line does not work ####
driver.quit()
for i in range(len(hostingPages)):
downloadExtension(hostingPages[i], packageNames[i])

Related

How do I get Microsoft Edge to click on SAVE AS with Python

In our company we are using Selenium for web-automation on Windows. Win runs with signed-in user and locked screen. Recently company changed security for not starting downloading automatically. I need to click with Python on Save As button (while Windows is locked => pyautogui is not a option). What about pywinauto or other lib? thx

If you can still use Selenium, try this out:
NOTE - Tested using Windows 11, Edge 97, Python 3.9, and Selenium 4.1
import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.service import Service
# 'executable_path' is deprecated. Use 'service'
driver = webdriver.Edge(
service=Service("C:\\Users\\kabarto\\Documents\\webdrivers\\msedgedriver.exe"))
driver.get("https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/")
# Find the link and use .click to save to Downloads
driver.find_element(
By.XPATH,
"//a[#href=\"https://msedgedriver.azureedge.net/97.0.1072.62/edgedriver_win64.zip\"]"
).click()
# Close browser when download is complete
while not os.path.exists("C:\\Users\\kabarto\\Downloads\\edgedriver_win64.zip"):
time.sleep(1)
driver.quit()
You can also get a list of all the links, and then go through them to find the one you want:
links = driver.find_elements(By.TAG_NAME, "a")
for l in links:
if "win64" in l.get_attribute("href"):
print(l.get_attribute("href"))

I figured it out (work around):
Before downloading, to open a new tab with Edge Downloads
To launch downloading (getting pop-up)
To active tab with Downloads and with Selenium click on particular option (Cancel Save Save As ....)

Web Scraping XML file that's Downloaded After Submitting Form (Python)

How do you scrape an xml file that automatically downloads to your computer after submitting a form? I haven't seen any examples that contain a file that is from submitted data. I haven't been able to find it in the python documentation. https://docs.python.org/3/library/xml.etree.elementtree.html . Any suggestions, or links would be very much appreciated.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
url = 'https://oui.doleta.gov/unemploy/claims.asp'
driver = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\chromedriver.exe")
driver.implicitly_wait(10)
driver.get(url)
driver.find_element_by_css_selector('input[name="level"][value="state"]').click()
Select(driver.find_element_by_name('strtdate')).select_by_value('2020')
Select(driver.find_element_by_name('enddate')).select_by_value('2022')
driver.find_element_by_css_selector('input[name="filetype"][value="xml"]').click()
select = Select(driver.find_element_by_id('states'))
# Iterate through and select all states
for opt in select.options:
opt.click()
input('Press ENTER to submit the form')
driver.find_element_by_css_selector('input[name="submit"][value="Submit"]').click()

This post on stackoverflow seems to be in the direction of what I'm looking for; How to download XML files avoiding the popup This type of file may harm your computer through ChromeDriver and Chrome using Selenium in Python
This link helped the most for my needs, so I'll add it here, however, the first link is very helpful as well so I want to leave it.
How to control the download of files with Selenium + Python bindings in Chrome

How do I wait until a webpage is loaded before opening another tab

I've made this little python script to automate opening the websites I need in the morning, take a look `
Required Modules
import webbrowser
Open the websites
`webbrowser.get('firefox').open_new_tab('https://www.netflix.com')
webbrowser.get('firefox').open_new_tab('https://www.facebook.com')
webbrowser.get('firefox').open_new_tab('https://www.udemy.com') `
And I don't know how to wait until the webpage is loaded before opening the next one (in another tab), any help?

You could take the approach as mentioned at How to wait for the page to fully load using webbrowser method? and check for a certain element in the page manually.
Another options would be to import time and call it after opening each tab time.sleep(5) which waits for 5 seconds before running the next line of code.
import webbrowser
from time import sleep
links = ['https://www.netflix.com', 'https://www.facebook.com', 'https://www.udemy.com']
for link in links:
webbrowser.get('firefox').open_new_tab(link)
sleep(5)
Selenium Implementation:
Note: This implemenetation opens your URL's in multiple windows rather than a single window and multiple tabs.
I will be using the chrome driver which you can install at https://chromedriver.chromium.org/downloads
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("detach", True) #this is just to keep the windows open even after the script is done running.
urls = ['https://www.netflix.com', 'https://www.facebook.com', 'https://www.udemy.com']
def open_url(url):
driver = webdriver.Chrome(executable_path=os.path.abspath('chromedriver'), chrome_options=chrome_options)
# I've assumed the chromedriver is installed in the same directory as the script. If not, mention the path to the chromedriver executable here.
driver.get(url)
for url in urls:
open_url(url)

Post data when opening a browser in Python

Is there a way to open a browser from a Python application, while posting data to it in POST method?
I tried using webbrowser.open(url) to open the browser window, but I need to also post some data to it (via POST variables), to make sure the page opens with the proper information.
As an example, I would like to open http://duckduckgo.com, while posting "q=mysearchterm" as POST data to it, so the page will open with pre-filled data.

I believe you should be using a Web Driver such as Selenium. Here is an example:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome(r"PATHTO\chromedriver.exe") # download the chromedriver and provide the full path here
browser.get('http://duckduckgo.com/')
search = browser.find_element_by_name('q')
search.send_keys("mysearchterm")
search.send_keys(Keys.RETURN)
time.sleep(15)

Get the href generated from some javascript attached to a button with selenium?

I'm using selenium to pull some automated phone reporting from our phone system (Barracuda Cudatel, very nice small business system but it doesn't have an API for what I need). There is a button on the report page that has some javascript attached to it on a click listener that then tells the browser to download the file.
Obviously selenium isn't really designed to pull files like this, however all I'm trying to do is get the href of the url that would have been sent to the browser. I can then turn around and use the session credentials with requests to pull the file and do processing on it.
How do I do the following (In Python):
Query for the event listener for 'click'
Fire off the javascript
Get the resulting URL
Edit: I'm aware download location can be configured on the browser in selenium however I'm not interested in completing this task in that fashion. This is running against a selenium grid of 20 machines and the request could be routed to any of them. Since I can't pull the file through selenium I'm going to just pull it directly with requests.
Code I'm twiddling with is below.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from time import sleep
dcap = webdriver.DesiredCapabilities.CHROME
driver = webdriver.Remote(command_executor='http://gridurl:4444/wd/hub', desired_capabilities=dcap)
driver.get("http://cudatelurl")
driver.find_element_by_name("__auth_user").send_keys("user")
driver.find_element_by_name("__auth_pass").send_keys("password")
driver.find_element_by_id("manage").click()
driver.get("http://cudatelurl/#!/cudatel/cdrs")
sleep(5)
date_dropdown = Select(driver.find_element_by_xpath('//*[#id="cui-content-inner"]/div[3]/div/div/div/div[2]/div/div/div[1]/div[2]/div/select'))
date_dropdown.select_by_value("last_week")
# This is the element that has javascript attached to it the click register is
# button.widgetType.csvButtonWidget.widgetized.actionButtonType.table-action
# but I'd prefer to not hard code it
driver.find_element_by_xpath('//*[#id="cui-content-inner"]/div[3]/div/div/div/div[2]/div/div/div[1]/div[2]/button[1]')
print(driver.get_cookies())
print(driver.title)
sleep(10)
driver.close()
driver.quit()

You can still approach it with selenium by configuring the target directory for the file of a specific mime-type to be automatically downloaded (without the Save As dialog), see:
Download PDF files automatically in Firefox using Selenium WebDriver
Access to file download dialog in Firefox
Firefox + Selenium WebDriver and download a csv file automatically

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download file from linked HTML ref, use in Selenium python script - python

Related

How do I get Microsoft Edge to click on SAVE AS with Python

Web Scraping XML file that's Downloaded After Submitting Form (Python)

How do I wait until a webpage is loaded before opening another tab

Post data when opening a browser in Python

Get the href generated from some javascript attached to a button with selenium?

Categories

Resources