Fellows,
I'm doing some webscraping and need to download multiple PDFs from the www1.hkexnews.hk website.
However, I encountered a problem while trying to make my Selenium chromedriver tick the box that appears every time one wants to download a PDF on the said website. The code executes, but the box still appears unclicked.
Please refer to my source code below - would appreciate any advice!
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
driver.implicitly_wait(10)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = driver.find_element_by_xpath("//a[contains(text(),'Full Version')]")
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = driver.find_element_by_id('warning-statement-accept')
print("Now clicking...", checkbox.text)
checkbox.click
Edit: Thank you guys! The downloading works fine now, just one small follow-up question - how can I modify the downloading code to save each PDF according to its company name - available through all_names = driver.find_elements_by_xpath("//div[#class='applicant-name']")?
At the moment, I am using the automatic download options as per below, I guess the downloading logic would have to be adjusted (I would rather download the PDFs with correct names already, rather than employ the dirty workaround of using Python to change their names once they're saved...)
chrome_options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Downloads", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
This should do it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get(link)
elem = wait.until(EC.presence_of_element_located((By.XPATH,"//tr[#class='record-ap-phip']//a[contains(.,'Full Version')]")))
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[#id='warning-statement-dialog']//label[#for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[#id='warning-statement-dialog']//a[contains(#class,'btn-ok')]"))).click()
Here goes the modified version of the script which will kick out the newly opened tabs. I didn't include the downloading logic within the script. I suppose you can do that yourself.
driver.get(link)
current = driver.current_window_handle
for elem in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//tr[#class='record-ap-phip']//a[contains(.,'Full Version')]"))):
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[#id='warning-statement-dialog']//label[#for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[#id='warning-statement-dialog']//a[contains(#class,'btn-ok')]"))).click()
wait.until(EC.new_window_is_opened)
driver.switch_to.window([window for window in driver.window_handles if window != current][0])
print(driver.current_url)
driver.close()
driver.switch_to.window(current)
driver.quit()
There are several issues here:
"checkbox" locator is wrong.
Your current code will download the first PDF file only.
It is preferably to use expected conditions explicit waits instead of implicit wait.
This should work better:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
wait = WebDriverWait(driver, 20)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[contains(text(),'Full Version')]")))
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[./label[#for='warning-statement-accept']]//input")))
print("Now clicking...", checkbox.text)
checkbox.click
Related
I'm trying to download or rather read a csv from a website. The csv is hidden under a button. Here is the website link.
The csv I'm trying to save is on the furthest right and is denoted as a blue button labelled 'Download all data in .csv'
Here is a sample code I tried to use to download it, but the download fails:
# import the required libraries
from selenium import webdriver
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
prefs = {"download.default_directory" : "D:/Profession/Data Extraction and Web Scraping/Stocks Data Extraction - Core Scientific/Output"}
#example: prefs = {"download.default_directory" : "C:\Tutorial\down"};
options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(executable_path="D:/Software/Selenium WebDrivers/chromedriver_win32/chromedriver", options=options)
try:
driver.implicitly_wait(5)
driver.get("https://defillama.com/chains")
downloadcsv = driver.find_element(By.CLASS_NAME, 'sc-8f0f10aa-1')
download_button = downloadcsv.find_element(By.TAG_NAME, 'button')
download_button.click() # this should save it to the folder I specified in prefs
time.sleep(3)
driver.close()
except:
print('There is an Error!')
This code tries to download the csv, however the download fails. Is there a better way I could download the CSV, especially one that is under a button? Thanks!
Looks like you need to use a proper wait method.
implicitly_wait waits for element presence while you need to wait for element to be clickable, this is more mature element state. WebDriverWait is the best practice to be used in such (and almost all the others) cases.
The following code works:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)
url = "https://defillama.com/chains"
driver.get(url)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".sc-8f0f10aa-1 button"))).click()
I click on a specific button on a page, but for some reason there is one of the buttons that I can't click on, even though it's positioned exactly like the other elements like it that I can click on.
The code below as you will notice, it opens a page, then clicks to access another page, do this step because only then can you be redirected to the real url that has the //int.
import datetime
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
with open('my_user_agent.txt') as f:
my_user_agent = f.read()
headers = {
'User-Agent': my_user_agent
}
options = Options()
options.set_preference("general.useragent.override", my_user_agent)
options.set_preference("media.volume_scale", "0.0")
options.page_load_strategy = 'eager'
driver = webdriver.Firefox(options=options)
today = datetime.datetime.now().strftime("%Y/%m/%d")
driver.get(f"https://int.soccerway.com/matches/{today}/")
driver.find_element(by=By.XPATH, value="//div[contains(#class,'language-picker-trigger')]").click()
time.sleep(3)
driver.find_element(by=By.XPATH, value="//li/a[contains(#href,'https://int.soccerway.com')]").click()
time.sleep(3)
try:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class,'tbl-read-more-btn')]")))
driver.find_element(by=By.XPATH, value="//a[contains(#class,'tbl-read-more-btn')]").click()
time.sleep(0.1)
except:
pass
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//div[#data-exponload='']//button[contains(#class,'expand-icon')]")))
for btn in driver.find_elements(by=By.XPATH, value="//div[#data-exponload='']//button[contains(#class,'expand-icon')]"):
btn.click()
time.sleep(0.1)
I've tried adding btn.location_once_scrolled_into_view before each click to make sure the button is correctly in the click position, but the problem still persists.
I also tried using the options mentioned here:
Selenium python Error: element could not be scrolled into view
But the essence of the case kept persisting in error, I couldn't understand what the flaw in the case was.
Error text:
selenium.common.exceptions.ElementNotInteractableException: Message: Element <button class="expand-icon"> could not be scrolled into view
Stacktrace:
RemoteError#chrome://remote/content/shared/RemoteError.jsm:12:1
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:192:5
ElementNotInteractableError#chrome://remote/content/shared/webdriver/Errors.jsm:302:5
webdriverClickElement#chrome://remote/content/marionette/interaction.js:156:11
interaction.clickElement#chrome://remote/content/marionette/interaction.js:125:11
clickElement#chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:204:29
receiveMessage#chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:92:31
Edit 1:
I noticed that the error only happens when the element is colored orange (when they are colored orange it means that one of the competition games is happening now, in other words it is live).
But the button is still the same, it keeps the same element, so I don't know why it's not being clicked.
See the color difference:
Edit 2:
If you open the browser normally or without the settings I put in my code, the elements in orange are loaded already expanded, but using the settings I need to use, they don't come expanded. So please use the settings I use in the code so that the page opens the same.
What you missing here is to wrap the command in the loop opening those sections with try-except block.
the following code works. I tried running is several times.
import datetime
import time
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options, desired_capabilities=caps)
wait = WebDriverWait(driver, 10)
today = datetime.datetime.now().strftime("%Y/%m/%d")
driver.get(f"https://int.soccerway.com/matches/{today}/")
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(#class,'language-picker-trigger')]"))).click()
time.sleep(5)
wait.until(EC.element_to_be_clickable((By.XPATH, "//li/a[contains(#href,'https://int.soccerway.com')]"))).click()
time.sleep(5)
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class,'tbl-read-more-btn')]")))
driver.find_element(By.XPATH, "//a[contains(#class,'tbl-read-more-btn')]").click()
time.sleep(0.1)
except:
pass
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#data-exponload='']//button[contains(#class,'expand-icon')]")))
for btn in driver.find_elements(By.XPATH, "//div[#data-exponload='' and not(contains(#class,'status-playing'))]//button[contains(#class,'expand-icon')]"):
btn.click()
time.sleep(0.1)
UPD
We need to open only closed elements. The already opened sections should be stayed open. In this case click will always work without throwing exceptions. To do so we just need to add such indication - click buttons not inside the section where status is currently playing.
I'm working on a scraping project of Aliexpress, and I want to change the ship to country using selenium,for example change spain to Australia and click Save button and then scrap the page, I already found an answer it worked just I don't know how can I save it by clicking the button save using selenium, any help is highly appreciated. This is my code using for this task :
country_button = driver.find_element_by_class_name('ship-to')
country_button.click()
country_buttonn = driver.find_element_by_class_name('shipping-text')
country_buttonn.click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//li[#class='address-select-item ']//span[#class='shipping-text' and text()='Australia']"))).click()
Well, there are 2 pop-ups there you need to close first in order to access any other elements. Then You can select the desired shipment destination. I used WebDriverWait for all those commands to make the code stable. Also, I used scrolling to scroll the desired destination button before clicking on it and finally clicked the save button.
The code below works.
Just pay attention that after selecting a new destination pop-ups can appear again.
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
options = Options()
options.add_argument("--start-maximized")
s = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=s)
url = 'https://www.aliexpress.com/'
wait = WebDriverWait(driver, 10)
actions = ActionChains(driver)
driver.get(url)
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(#style,'display: block')]//img[contains(#src,'TB1')]"))).click()
except:
pass
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//img[#class='_24EHh']"))).click()
except:
pass
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "ship-to"))).click()
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "shipping-text"))).click()
ship_to_australia_element = driver.find_element(By.XPATH, "//li[#class='address-select-item ']//span[#class='shipping-text' and text()='Australia']")
actions.move_to_element(ship_to_australia_element).perform()
time.sleep(0.5)
ship_to_australia_element.click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[#data-role='save']"))).click()
I mostly used XPath locators here. CSS Selectors could be used as well
I want to download U.S. Department of Housing and Urban Development data using Python's Selenium. Here's my code.
import os
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
options = webdriver.ChromeOptions()
preferences= {"download.default_directory": os.getcwd(), "directory_upgrade": True}
options.add_experimental_option("prefs", preferences)
#options.headless = True
options.add_experimental_option('excludeSwitches', ['enable-logging'])
url = "https://hudgis-hud.opendata.arcgis.com/datasets/deteriorated-paint-index-by-county/explore"
# Path of my WebDriver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
wait = WebDriverWait(driver, 60)
# to maximize the browser window
driver.maximize_window()
#get method to launch the URL
driver.get(url)
paths = ["#ember97", "calcite-card > div > calcite-button"]
for x in paths:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, x))).click()
I can click the button to expand the side panel, where the CSV file button is located, but I cannot click the CSV file itself to download it. My first thought was to check for if the side panel existed within an IFRAME, so I did
seq = driver.find_elements_by_tag_name('iframe')
seq
And it returned nothing. The content is nested in a class called side-panel-ref. Is there a way to switch to this somehow so I can click that content, when iframes aren't there? What might I be missing?
Your button is inside a shadowroot.
You see this when you inspect in devtools:
Quickest and easiest way to handle this is with some JS .This is your script slightly refactored + the JS call:
url = "https://hudgis-hud.opendata.arcgis.com/datasets/deteriorated-paint-index-by-county/explore"
wait = WebDriverWait(driver, 60)
driver.maximize_window()
driver.get(url)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#ember97'))).click()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.dataset-download-card > hub-download-card")))
driver.execute_script('document.querySelector("div.dataset-download-card > hub-download-card").shadowRoot.querySelector("calcite-card > div > calcite-button").click()')
It's a fairly lengthy JS call. It's reasonably self explanatory if you read it, there are 5 parts to it: document.querySelector(..).shadowRoot.querySelector(..).click() - but just ask if you need more support.
Please also be aware that selenium is bad at downloading files. There's no API that exposes the downloads progress. You'll need to ensure your browser remains open while you download the file.
It seems a pretty quick download so you might get away with a hard coded sleep.
Also worth a mention - If you're not a fan of the long JS, you can also break it down like so:
container = driver.find_element_by_css_selector("div.dataset-download-card > hub-download-card")
shadowRoot = driver.execute_script("return arguments[0].shadowRoot", container)
shadowRoot.find_element_by_css_selector("calcite-card > div > calcite-button").click()
I want to send a string to the web page whose text field name is "inputfield". Actually, I can send the word to the page, but when I run the program, a new "chrome" page opens, which is used for testing purposes. However, I want to send a string to the field on a chrome page that is already open.
Here my code:
from selenium import webdriver
import time
url = "https://10fastfingers.com/typing-test/turkish"
options = webdriver.ChromeOptions()
options.binary_location = r"C://Program Files//Google//Chrome//Application//chrome.exe"
chrome_driver_binary = 'chromedriver.exe'
options.add_argument('headless')
driver = webdriver.Chrome(chrome_driver_binary, options=options)
driver.get(url)
driver.implicitly_wait(10)
text_area = driver.find_element_by_id('inputfield')
text_area.send_keys("Hello")
Nothing happens when I run this code. Can you please help? Can you run it by putting a sample web page in the url part?
Thank you.
EDIT: It is working when I deleted options. But still opening a new page when I run it. Is there a way use a page which already open on background.
chrome_driver_binary = 'chromedriver.exe'
driver = webdriver.Chrome(chrome_driver_binary)
driver.get('https://10fastfingers.com/typing-test/turkish')
text_area = driver.find_element_by_id('inputfield')
text_area.send_keys("Hello")
Click the popup prior to sending keys.
driver.get('https://10fastfingers.com/typing-test/turkish')
wait=WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.ID, "CybotCookiebotDialogBodyLevelButtonLevelOptinAllowallSelectionWrapper"))).click()
text_area = wait.until(EC.element_to_be_clickable((By.ID, "inputfield")))
text_area.send_keys("Hello")
Imports
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
I am not sure what is your question but if the issue is multiple tabs or windows being opened then:
you can switch between the windows as:
// you can move to specific handle
chwd = driver.window_handles
print(chwd)
driver.switch_to.window(chwd[-1])
you should shoul switch to the correct window before you can interact with elements on that window
just switch to the window that was already opened bypassing the index
If the problem is that you want to interact with an already opened chrome then you should follow below steps:
Start chrome with debug port:
<path>\chrome.exe" --remote-debugging-port=1559
Python :
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:1559")
driver = webdriver.Chrome(options=chrome_options)