I am web-scraping reviews from Goodreads for a project. Here's an example of a page I've been trying: https://www.goodreads.com/book/show/2767052-the-hunger-games/reviews?
The reviews page initially shows 30 reviews with a 'Show More' button at the bottom. Selenium seems unable to click the button.
Here is the code I'm using:
showmore_button = driver.find_element(By.XPATH, '/html/body/div[1]/div/main/div[1]/div[2]/div[4]/div[4]/div/button/span[1]')
driver.execute_script("arguments[0].click();", showmore_button)
I have also tried
showmore_button.click()
but that leads to an exception stating that the element is not clickable
For more context my driver is set up like this:
def createdriver():
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("start-maximized")
options.add_argument('--window-size=1920,1080')
options.add_argument("--incognito")
driver = webdriver.Chrome(options=options)
return driver
and then I use:
driver = createdriver()
driver.get(url)
Where the URL is the reviews page I'm trying to scrape
To click on the element Show more reviews at the bottom of the page you need to scrollIntoView() inducing WebDriverWait for the visibility_of_element_located() and you can use the following locator strategies:
Code block:
driver.get('https://www.goodreads.com/book/show/2767052-the-hunger-games/reviews?')
time.sleep(5)
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='ReviewsList__listContext ReviewsList__listContext--centered']//span[contains(., 'Displaying 1 -')]"))))
driver.execute_script("arguments[0].click();", driver.find_element(By.XPATH, "//span[text()='Show more reviews']"))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser snapshot:
As the page loads for few milli-seconds before the element is displayed, you need to apply selenium waits. Try using implicit wait after creating the driver instance, see code below:
driver = createdriver()
driver.implicitly_wait(10)
driver.get(url)
Above code waits for 10 seconds searching for the element before throwing error
Also another suggestion:
Instead of using an absolute XPath, as a best practice use relative XPath. This is because relative XPath is more consistent compared to absolute XPath. Absolute XPath may stop working, If the DOM structure changes in the future. Try the below relative XPath:
showmore_button = driver.find_element(By.XPATH, '//span[contains(text(),"Show more reviews")]')
driver.execute_script("arguments[0].click();", showmore_button)
Related
I try to catch the "next" button in a viewer with Selenium in Python, but nothing works and I get an error message.
The code I executed:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
driver.get('https://web.nli.org.il/sites/NLI/Hebrew/digitallibrary/pages/viewer.aspx?&presentorid=MANUSCRIPTS&docid=PNX_MANUSCRIPTS990000907570205171-1')
WebDriverWait(driver, 10)
Until now, everything has been fine. This is where the issue arises.
I want to click on the next arrow:
I get the following code while inspecting the source code of this arrow:
<div id="next" class="left"> </div>
So these are my attempts:
XPath
driver.find_elements_by_xpath('//*[#id="next"]')
returns an empty list.
id
driver.find_elements_by_id("next")
returns an empty list.
So, how can I catch this button and click on it?
That arrow is in iframe, so driver needs to change it's focus.
and then you can use explicit wait to click on it :
CSS_SELECTOR :
div#next
Sample code :
driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
driver.get('URL here')
WebDriverWait(driver, 10)
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "MainIframe")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#next"))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
you can use below css selectors:
.left [title='Next']
div.left
div .left [title='Next']
//div[#id='next']//child::a there could be different element with the id next so specifically define for the div and, I can see the a is the child of div
you can check it with the explicitWait
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#id='next']//child::a"))).click()
Also found an iframe with the id = MainIframe and name = zoomer
under which the desired element is available, so you could switch
to the frame and get the desired element
My first test with selenium is to click a button on a website. The first button that I need to click is this "yes you can use cookies"-buttons in the popup of a website. But it seems that selenium doesn't find that button even though I added a wait line. I tried other buttons in the popup as well, but none of them can be found by my element_to_be_clickable. The element is in an iframe, so I guess I have to change to it, but it seems that I'm doing something wrong.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver_path = "D:/Python/learning_webclicker/firefox_driver/geckodriver.exe"
firefox_path = "C:/Program Files/Mozilla Firefox/firefox.exe"
option = webdriver.FirefoxOptions()
option.binary_location = firefox_path
driver = webdriver.Firefox(executable_path=driver_path, options=option)
url = "https://web.de/"
driver.get(url)
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_xpath("/html/body/div[2]/iframe")))
#I tried to find the "save-all-conditionally"-element with lots of different methods:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "save-all-conditionally"))).click()
#WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, """//*[#id="save-all-conditionally"]"""))).click()
#WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "save-all-conditionally")))
# ...
This raises the error
selenium.common.exceptions.TimeoutException: Message:
And if I try to click the button directly after changing to iframe (or without checking for iframe), then I get
driver.implicitly_wait(10)
element=driver.find_element_by_xpath("""//*[#id="save-all-conditionally"]""")
element.click()
>>> selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: [id="save-all-conditionally"]
I guess, that I'm not really in the iframe (although frame_to_be_available_and_switch_to_it doesn't return an error), but I'm not sure how/what/why.
The element you are looking after is inside nested iframe. You need to switch both the
iframes.
Use following css selector to identify the iframe.
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name='landingpage']")))
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src*='plus.web.de']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "save-all-conditionally"))).click()
Or Use below xpath to identify the iframe.
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#name='landingpage']")))
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[contains(#src,'plus.web.de')]")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "save-all-conditionally"))).click()
I'm learning to use Selenium for web scraping. I have a couple of questions with the website I'm working with:
-The website has multiple pages to go over and I can't seem to find a way to locate the pages' paths and go over them. For example, the following code returns link_page as NoneType.
from selenium import webdriver
import time
driver = webdriver.Chrome('chromedriver')
driver.get('https://www.oddsportal.com/soccer/england/premier-league')
time.sleep(0.5)
results_button = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[2]/ul/li[3]/span')
results_button.click()
time.sleep(3)
season_button = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[3]/ul/li[2]/span/strong/a')
season_button.click()
link_page = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[6]/div/a[3]/span').get_attribute('href')
print(link_page.text)
driver.get(link_page)
-For some reason I have to use the results_button to be able to get the href of matches. For example, the following code tries to go the page directy (as an attempt to circumvent problem 1 above), but the link_page returns a NoSuchElementException error.
from selenium import webdriver
import time
driver = webdriver.Chrome('chromedriver')
driver.get('https://www.oddsportal.com/soccer/england/premier-league/results/#/page/2')
time.sleep(3)
link_page = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[6]/table/tbody/tr[11]/td[2]/a').get_attribute('href')
print(link_page.text)
driver.get(link_page)
To locate the pages to go over them using Selenium you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategies:
Using XPATH:
driver.get('https://www.oddsportal.com/soccer/england/premier-league/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[text()='RESULTS']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[text()='2018/2019']"))).click()
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[#class='active-page']//following::a[#x-page]/span[not(contains(., '|')) and not(contains(., '»'))]/..")))])
Console Output:
['https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/2/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/3/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/4/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/5/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/6/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/7/', 'https://www.oddsportal.com/soccer/england/premier-league-2018-2019/results/#/page/8/']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I want to, click on the button to resolve the captcha through the audio, but selenium does not detect the specified "id".
browser.get("https://www.google.com/recaptcha/api2/demo")
mainWin = browser.current_window_handle
iframe = browser.find_elements_by_tag_name("iframe")[0]
browser.switch_to_frame(iframe)
CheckBox = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID ,"recaptcha-anchor"))).click()
sleep(4)
audio = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID ,"recaptcha-audio-button"))).click()
To click() on the button to resolve the captcha through the audio as the desired elements are within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use the following Locator Strategies:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.google.com/recaptcha/api2/demo")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src^='https://www.google.com/recaptcha/api2/anchor']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span#recaptcha-anchor"))).click()
driver.switch_to.default_content()
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='recaptcha challenge']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#recaptcha-audio-button"))).click()
Browser Snapshot:
Reference
Ways to deal with #document under iframe
Outro
You can find a couple of relevant discussions in:
How to click on the reCaptcha using Selenium and Java
CSS selector for reCaptcha checkbok using Selenium and vba excel
Find the reCAPTCHA element and click on it — Python + Selenium
Very useful, just put your attention, that text: 'recaptcha challenge' in selector below depends from regional settings/language:
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='recaptcha challenge']")))
I am trying to scrape some data from an iframe located within a webpage. The URL of the webpage is https://www.nissanoflithiasprings.com/schedule-service. I am trying to access the button shown in the image below:
When I right-click on the button (located inside the iframe) to view the source code, I am able to see the HTML id and name (see screenshot below):
The "id" for the button is "new_customer_button". However, when I use selenium webdriver's driver.find_element_by_id("new_customer_button") to access the button, the code is not able to locate the button inside the iframe and throws the following error:
NoSuchElementException: no such element: Unable to locate element: {"method":"id","selector":"new_customer_button"}
Below is the code that I have tried so far:
from selenium import webdriver
chrome_path = r"C:\Users\gh455\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.nissanoflithiasprings.com/schedule-service")
dest_iframe = driver.find_elements_by_tag_name('iframe')[0]
driver.switch_to.frame(dest_iframe)
driver.find_element_by_id("new_customer_button")
Not sure why this is happening. Any help will be appreciated. Thanks!
The element is inside multiple <iframe> tags, you need to switch to them one by one. You should also maximize the window and use explicit wait as it take some time to load
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
chrome_path = r"C:\Users\gh455\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get("https://www.nissanoflithiasprings.com/schedule-service")
wait = WebDriverWait(driver, 10)
# first frame - by css selector
wait.until(ec.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, '[src^="https://consumer.xtime.com"]')))
# second frame - by ID
wait.until(ec.frame_to_be_available_and_switch_to_it('xt01'))
driver.find_element_by_id("new_customer_button")
To click() on the element with text as Make · Year · Model as the the desired element is within an nested <iframe>s so you have to:
Induce WebDriverWait for the desired parent frame to be available and switch to it.
Induce WebDriverWait for the desired child frame to be available and switch to it.
Induce WebDriverWait for the desired element_to_be_clickable().
You can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument('disable-infobars')
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.nissanoflithiasprings.com/schedule-service")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src*='com/scheduling']")))
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src*='consumerportal']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.button.button--action.btn.btn-secondary#new_customer_button"))).click()
Browser Snapshot:
Here you can find a relevant discussion on Ways to deal with #document under iframe