Selenium. Unable to locate element from the html website - python

Here's the link of the website I'm trying to scrape (I'm training for the moment, nothing fancy):
link
Here's my script, he's quite long but nothing too complicated :
from selenium import webdriver
if __name__ == "__main__":
print("Web Scraping application started")
PATH = "driver\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')
driver = webdriver.Chrome(options=options, executable_path=PATH)
driver.get('https://fr.hotels.com/')
driver.maximize_window()
destination_location_element = driver.find_element_by_id("qf-0q-destination")
check_in_date_element = driver.find_element_by_id("qf-0q-localised-check-in")
check_out_date_element = driver.find_element_by_id("qf-0q-localised-check-out")
search_button_element = driver.find_element_by_xpath('//*[#id="hds-marquee"]/div[2]/div[1]/div/form/div[4]/button')
print('Printing type of search_button_element')
print(type(search_button_element))
destination_location_element.send_keys('Paris')
check_in_date_element.clear()
check_in_date_element.send_keys("29/05/2021")
check_out_date_element.clear()
check_out_date_element.send_keys("30/05/2021")
close_date_window = driver.find_element_by_xpath('/html/body/div[7]/div[4]/button')
print('Printing type of close_date_window')
print(type(close_date_window))
close_date_window[0].click()
search_button_element.click()
time.sleep(10)
hotels = driver.find_element_by_class_name('hotel-wrap')
print("\n")
i = 1
for hotel in hotels:
try:
print(hotel.find_element_by_xpath('//*[#id="listings"]/ol/li['+str(i)+']/article/section/div/h3/a').text)
print(hotel.find_element_by_xpath('//*[#id="listings"]/ol/li[' + str(i) + ']/article/section/div/address/span').text)
except Exception as ex:
print(ex)
print('Failed to extract data from element.')
i = i +1
print('\n')
driver.close()
print('Web Scraping application completed')
And here's the error I get :
File "hotelscom.py", line 21, in <module>
destination_location_element = driver.find_element_by_id("qf-0q-destination")
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="qf-0q-destination"]"}
(Session info: chrome=90.0.4430.85)
Any idea how to fix that ? I don't understand why it get me this error because in the html code, there is this syntax. But i guess I'm wrong.

You have multiple problems with your code and the site.
SITE PROBLEMS
1 The site is located on multiple servers and different servers have different html code. I do not know if it depends on location or not.
2 The version I have solution for has few serious bugs (or maybe those are features). Among them:
When you press Enter it starts hotels search when a date field is opened and and you just want to close this date field. So, it is a problem to close input fields in a traditional way.
clear() of Selenium does not work as it is supposed to work.
BUGS IN YOUR CODE
1 You are defining window size in options and you are maximizing the window immediately after site is opened. Use only one option
2 You are entering dates like "29/05/2021", but sites recognises formats only like: "05/30/2021". It is a big difference
3 You are not using any waits and they are extremely important.
4 Your locators are wrong and unstable. Even locators with id did not always work for me because if you will make a search, there are two elements for some of them. So I replaces them with css selectors.
Please note that my solution works only for an old version of site. If you want to a specific version to be opened you will need either:
Get the site by a direct ip address, like driver.get('site ip address')
Implement a strategy in your framework which recognises which site version is opened and applies inputs depending on it.
SOLUTION
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
if __name__ == "__main__":
print("Web Scraping application started")
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')
driver = webdriver.Chrome(options=options, executable_path='/snap/bin/chromium.chromedriver')
driver.get('https://fr.hotels.com/')
wait = WebDriverWait(driver, 15)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#qf-0q-destination")))
destination_location_element = driver.find_element_by_css_selector("#qf-0q-destination")
destination_location_element.send_keys('Paris, France')
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".widget-autosuggest.widget-autosuggest-visible table tr")))
destination_location_element.send_keys(Keys.TAB) # workaround to close destination field
driver.find_element_by_css_selector(".widget-query-sub-title").click()
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".widget-query-group.widget-query-destination [aria-expanded=true]")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#qf-0q-localised-check-in")))
check_in_date_element = driver.find_element_by_css_selector("#qf-0q-localised-check-in")
check_in_date_element.send_keys(Keys.CONTROL, 'a') # workaround to replace clear() method
check_in_date_element.send_keys(Keys.DELETE) # workaround to replace clear() method
# check_in_date_element.click()
check_in_date_element.send_keys("05/30/2021")
# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#qf-0q-localised-check-out")))
check_out_date_element = driver.find_element_by_id("qf-0q-localised-check-out")
check_out_date_element.click()
check_out_date_element.send_keys(Keys.CONTROL, 'a')
check_out_date_element.send_keys(Keys.DELETE)
check_out_date_element.send_keys("05/31/2021")
driver.find_element_by_css_selector(".widget-query-sub-title").click() # workaround to close end date
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hds-marquee button"))).click()
Spent on this few hours, the task seemed just interesting for me.
It works for this UI:
The code can still be optimised. It's up to you.
UPDATE:
I found out that the site has at least three home pages with three different Destination and other fields locators.
The easiest workaround that came into my mind is something like this:
try:
element = driver.find_element_by_css_selector("#qf-0q-destination")
if element.is_displayed():
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#qf-0q-destination")))
destination_location_element = driver.find_element_by_css_selector("#qf-0q-destination")
print("making input to Destination field of site 1")
destination_location_element.send_keys('Paris, France')
# input following data
except:
print("Page 1 not found")
try:
element = driver.find_element_by_css_selector("input[name=q-destination-srs7]")
if element.is_displayed():
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name=q-destination-srs7]")))
destination_location_element = driver.find_element_by_css_selector("input[name=q-destination-srs7]")
print("making input to Destination field of site 2")
destination_location_element.send_keys('Paris, France')
# input following data
except:
print("Page 2 is not found")
try:
element = driver.find_element_by_css_selector("form[method=GET]>div>._1yFrqc")
if element.is_displayed():
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "form[method=GET]>div>._1yFrqc")))
destination_location_element = driver.find_element_by_css_selector("form[method=GET]>div>._1yFrqc")
print("making input to Destination field of site 3")
destination_location_element.send_keys('Paris, France')
# input following data
except:
print("Page 3 is not found")
But the best solution would be to have a direct access to a specific server that has only one version available.
Please also note that if you access the site by a direct link for France: https://fr.hotels.com/?pos=HCOM_FR&locale=fr_FR your input dates will be as you initially specified, for example 30/05/2021.

Try this
driver.find_element_by_xpath(".//div[contains(#class,'destination')]/input[#name='q-destination']")
Also please add wait after you maximize the window

You are missing a wait / sleep before finding the element.
So, just add this:
element = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.ID, "qf-0q-destination")))
element.click()
to use this you will have to use the following imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as E

Related

Python Selenium Chrome - send_keys() not sending keys during 2nd iteration of loop when scraping WhitePages

I'm writing a function that opens WhitePages, searches a person's name and location, and scrapes their phone number and address. It does this by:
Navigating to whitepages.com
Finding the name <input> and sending it keys (send_keys(persons_name))
Finding the location <input> and sending it keys (send_keys(my_city))
Finding the search button <button> and clicking it
On the search results page, finding the link <a> to the person's page
On the person's page, finding and returning the person's landline and address
When I run the function in a loop on a list of names, the function runs successfully on the first iteration, but not the second. For testing purposes, I'm running the WebDriver with a head/GUI so that I can verify what is going on, and it seems as though on the second iteration, the function successfully finds the name <input> but doesn't input the person's name via send_keys(), then successfully finds the location <input> and successfully inputs the location, then successfully finds and click()s the search button.
Since there must be a name in the name <input> for a search to be done, no search occurs and red text under the name <input> appears saying "Last name is required" (that's how I know for sure send_keys() is failing), and then I get a NoSuchElementException when the program tries to find a search result element that doesn't exist since no search results page was loaded.
(Note: by default, WhitePages denies access to the program when trying to hit search; the options.add_argument('--disable-blink-features=AutomationControlled') in the code below circumvents that.)
So, what may be happening that is causing send_keys() to fail, and how do I fix it?
Full code:
from selenium import webdriver
# for passing a URL as a service object into the Chrome webdriver initializing method
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
# for clicking buttons
from selenium.webdriver.common.action_chains import ActionChains
# raised when using find_element() and no element exists matching the given criteria
from selenium.common.exceptions import NoSuchElementException
# for specifying to run the browser headless (w/o UI) and to surpress warnings in console output
from selenium.webdriver.chrome.options import Options
# for choosing an option from a dropdown menu
from selenium.webdriver.support.select import Select
def scrape_individual_info_wp(driver, individual_name, city_state):
# FIND INDIVIDUAL ON WHITEPAGES & NAVIGATE TO THEIR INDIVIDUAL PAGE
driver.get('https://www.whitepages.com/')
# find name input
driver.find_element(By.XPATH, "//form/div/div/div/div/input").send_keys(individual_name) # attempt to find the input *relatively*
# find location input
driver.find_element(By.XPATH, "//form/div/div/following-sibling::div/div/div/input").send_keys(city_state)
# find & click search button
driver.find_element(By.XPATH, "//form/div/div/button").click()
# FIND INDIVIDUAL IN SEARCH RESULTS
# click (first) free search result link
driver.find_element(By.XPATH, "//div[#class='results-container']/a").click()
# SCRAPE PERSON'S INFO
landline = driver.find_element(By.XPATH, "//div[contains(text(),'Landlines')]/following-sibling::div/a").text.strip()
address_info = driver.find_element(By.XPATH, "//p[contains(text(),'Current Address')]/parent::div/div/div/div/a").text.strip().split('\n')
address = address_info[0]
city_state_zip = address_info[1]
return [driver.current_url, address, city_state_zip, landline]
# selenium webdriver setup
options = webdriver.ChromeOptions()
# for the webdriver; suppresses warnings in terminal
options.add_experimental_option('excludeSwitches', ['enable-logging'])
# options.add_argument("--headless")
# options.add_argument('--disable-gpu')
# options.add_argument('--no-sandbox')
options.add_argument('--start-maximized')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
# below, you provide the path to the WebDriver for the browser of your choice, not the path to the browser .exe itself
# the WebDriver is a browser extension that you must install in order for Selenium to work with that browser
driver = webdriver.Chrome(service=Service(r'C:\Users\Owner\OneDrive\Documents\Gray Property Group\Prospecting\Python\Selenium WebDriver for Chrome\chromedriver.exe'), options=options)
driver.implicitly_wait(10)
# driver.maximize_window()
from time import sleep
names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
for name in names:
print(name + ':')
individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')
for field in individual_info:
print('\t' + field)
print('\n')
driver.quit()
Output:
Kevin J Haggerty:
https://www.whitepages.com/name/Kevin-J-Haggerty/Bedford-NH/PLyZ4BaGl8Q
26 Southgate Dr
Bedford, NH 03110
(603) 262-9114
Patricia B Halliday:
Traceback (most recent call last):
(...)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='results-container']/a"}
Screenshot of browser (see arrow / red text):
Try with below code once.
Navigate to the Website once and can use driver.back() to come back to the original Page.
User Explicit wait to wait for the elements to appear. And can use good locators like ID or CLASS_NAME to locate the elements.
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
def scrape_individual_info_wp(driver, individual_name, city_state):
name_field = wait.until(EC.element_to_be_clickable((By.ID, "desktopSearchBar")))
name_field.clear() # Clear the field to enter a new name
name_field.send_keys(individual_name)
state_field = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "pa-3")))
state_field.clear() # Clear the field and enter the state
state_field.send_keys(city_state)
search = wait.until(EC.element_to_be_clickable((By.ID, "wp-search")))
search.click()
time.sleep(2) # Using Sleep to make sure the data is loaded. Can use Waits to wait for the data to appear and extract the same.
# Code to scarpe data.
driver.back()
driver = webdriver.Chrome(service= Service("C:/expediaproject/Chromedriver/chromedriver.exe"))
driver.maximize_window()
wait = WebDriverWait(driver,30)
driver.get("https://www.whitepages.com/")
names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
state = "Manchester, NH"
for name in names:
scrape_individual_info_wp(driver, name, state)
driver.quit()
One of the scripts is reopening the page if any input is not empty when the script runs.
(There are too many scripts on the page; I was not able to narrow down which one it is.)
A simple solution is to add sleep(1) after driver.get(...).
driver.get('https://www.whitepages.com/')
sleep(1) # Add this
Power usage
sleep(1) significantly slows down the script, especially if looping through many names.
Instead, we can use two tabs alternately and prepare each tab for the next-next iteration.
Setup before loop:
driver.get('https://www.whitepages.com/')
driver.execute_script('window.open("https://www.whitepages.com/")')
driver.switch_to.window(driver.window_handles[0])
next_index = 1
Prepare for the next and next-next iterations after processing each name:
for name in names:
...
# Prepare this tab for next-next iteration
driver.get('https://www.whitepages.com/')
# Switch to another tab for next iteration
driver.switch_to.window(driver.window_handles[next_index])
next_index = 1 - next_index
Comment out driver.get(...) in scrape_individual_info_wp function:
# driver.get('https://www.whitepages.com/')
You may create a new driver for each person, so in each iteration you will start off from the home page and navigate to your desired pages.
I did these kinds of things using Selenium in Python, and this was my common approach when scraping multiple pages.
names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
for name in names:
print(name + ':')
driver = webdriver.Chrome(service=Service("C:/expediaproject/Chromedriver/chromedriver.exe"))
wait = WebDriverWait(driver,30)
driver.get("https://www.whitepages.com/")
individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')
for field in individual_info:
print('\t' + field)
print('\n')
driver.quit()

I am trying to use selenium to load a waiting list and click the button, but I can't seem to find the element

I've tried to use find_element_by_class_name and link text and both result in NoSuchElement Exception. I'm just trying to click this Join waitlist button for https://v2.waitwhile.com/l/fostersbarbershop/list-view - Any assistance is greatly appreciated.
from selenium import webdriver
import time
PATH = "C:\Python\Pycharm\attempt\Drivers\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://v2.waitwhile.com/l/fostersbarbershop/list-view")
join = False
while not join:
try:
joinButton = driver.find_element_by_class_name("disabled")
print("Button isnt ready yet.")
time.sleep(2)
driver.refresh()
except:
joinButton = driver.find_element_by_class_name("public-submit-btn")
print("Join")
joinButton.click()
join = True
It seems you have synchronization issue.
Induce WebDriverWait() and wait for element_to_be_clickable() for following ID locator
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "join-waitlist"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "ww-name"))).send_keys("TestUser")
You need to import below libraries.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser snapshot
The page your trying to automate is angular. Typically what happens in script-based pages is you download the source, the page load event is classed as complete, then some JS scripts run to get/update the page content. Those scripts (which can take seconds to complete) update the DOM with the page you see.
In contrast selenium can only recognise the page is loaded - Selenium will be attempting to find those elements at this readystate-point and is unaware of any running scripts.
You will need to wait for your element to be ready before you proceed.
Easy solution is to add an implicit wait to your script.
An implicit wait will ignore the nosuchelement exception until the timeout has been reached.
See here for more information on waits.
This is the line of code to add:
driver.implicitly_wait(10)
Just adjust the time based on what you need.
You only need it once.
This is the code that works for me:
driver = webdriver.Chrome()
driver.get("https://v2.waitwhile.com/l/fostersbarbershop/list-view")
driver.implicitly_wait(10)
join = False
while not join:
try:
joinButton = driver.find_element_by_class_name("disabled")
print("Button isnt ready yet.")
time.sleep(2)
driver.refresh()
except:
joinButton = driver.find_element_by_class_name("public-submit-btn")
print("Join")
joinButton.click()
join = True
This is the end state:

Problems logging in to site with Selenium

I'm trying to learn how to use Selenium to log into a site:Ingram-Micro. I made a script and it worked on a different page: https://news.ycombinator.com/login.
Now I'm trying to apply the same thing to Ingram-Micro and I'm stuck and I don't know what else to try. The problem I'm having is a error/message that says the submit element is not clickable, there is a accept cookies button on the bottom of the page which seems to be causing the problem.
I've tried to account for it but I always get error saying that element doesn't exist. Yet if I don't try to click on the accept cookies element I get the original error saying the submit button isn't clickable. Here is my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import time
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
url = "https://usa.ingrammicro.com/_layouts/CommerceServer/IM/Login.aspx?
returnurl=//usa.ingrammicro.com/"
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
def login():
USERNAME = 'email'
PASSWORD = 'password'
element = driver.find_element_by_link_text('I ACCEPT')
if element.is_displayed():
print("Element found")
element.click()
else:
print("Element not found")
driver.find_element_by_id('okta-signin-username').send_keys(USERNAME)
driver.find_element_by_id('okta-signin-password').send_keys(PASSWORD)
driver.find_element_by_id('okta-signin-submit').click()
login()
try:
me = driver.find_element_by_id("login_help-about")
print(f"{me.text} Element found")
except NoSuchElementException:
print('Not found')
driver.quit()
Here are the errors I get:
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <input class="button button-primary" type="submit" value="Log in" id="okta-signin-submit" data-type="save"> is not clickable at point (365, 560). Other element would receive the click: <p class="cc_message">...</p>
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate
element: {"method":"link text","selector":"I ACCEPT"}
(Session info: headless chrome=84.0.4147.125)
The challenge you face is synchronisation around scripts.
The chain of events on this site is 1) the page is loaded, 2) it kicks off it's javascript, 3) that slides the cookie window into view...
However, after the page is loaded, selenium doesn't know about the scripts so it thinks it is good to go. It's trying to click the button before it's there and gets upset that it can't find it. (NoSuchElementException)
There are different sync strategies - What works here is a webdriverwait to tell selenium to wait (without error) until that your object reached the specified expected conditions.
You can read more about waits and expected conditions here
Try this code.
For the cookie "I ACCEPT" button, I changed the identifier to xpath (since i like xpaths) and wrapped it in webdriverwait, waiting for the object to be clickable...
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
chrome_options = Options()
#chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
url = "https://usa.ingrammicro.com/_layouts/CommerceServer/IM/Login.aspx?returnurl=//usa.ingrammicro.com/"
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
def login():
USERNAME = 'email'
PASSWORD = 'password'
element = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//a[text()="I ACCEPT"]')))
if element.is_displayed():
print("Element found")
element.click()
else:
print("Element not found")
driver.find_element_by_id('okta-signin-username').send_keys(USERNAME)
driver.find_element_by_id('okta-signin-password').send_keys(PASSWORD)
driver.find_element_by_id('okta-signin-submit').click()
login()
Note that i had to remove headless to check it worked and there are 3 additional imports at the top.
Webdriverwait is great when you don't have lots of complicated objects, or have ojects with different wait conditions.
An alternative sync and (Easier in my opionin) is to set an implicit wait ONCE at the start of your script - and this configures the driver objecct.
driver.implicitly_wait(10)
As that link earlier says:
An implicit wait tells WebDriver to poll the DOM for a certain amount
of time when trying to find any element (or elements) not immediately
available. The default setting is 0. Once set, the implicit wait is
set for the life of the WebDriver object.
You can use it like this .. not doing all the code, just add this one line added after you create your driver and your code worked:
.....
url = "https://usa.ingrammicro.com/_layouts/CommerceServer/IM/Login.aspx?returnurl=//usa.ingrammicro.com/"
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
driver.implicitly_wait(10) # seconds
def login():
USERNAME = 'email'
PASSWORD = 'password'
element = driver.find_element_by_link_text('I ACCEPT')
if element.is_displayed():
print("Element found")
element.click()
else:
print("Element not found")
........
You probably need to click on the div above the input. try somethin like this:
child = driver.find_element_by_id('okta-signin-submit')
parent = child.find_element_by_xpath('..') # get the parent
parent.click() # click parent element
UPDATE: This worked great on geckodrive without headless, but not with chromedrive. so instead i've tried something else. Instead of clicking the button, lets just hit enter in the form and submit it this way:
from selenium.webdriver.common.keys import Keys
...
driver.find_element_by_id('okta-signin-username').send_keys(USERNAME)
password_field = driver.find_element_by_id('okta-signin-password')
password_field.send_keys(PASSWORD)
password_field.send_keys(Keys.RETURN)

Dynamically generated element -NoSuchElementException: Message: no such element: Unable to locate element?

I am having trouble selecting a load more button on a Linkedin page. I receive this error in finding the xpath: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element
I suspect that the issue is that the button is not visible on the page at that time. So I have tried actions.move_to_element. However, the page scrolls just below the element, so that the element is no longer visible, and the same error subsequently occurs.
I have also tried move_to_element_with_offset, but this hasn't changed where the page scrolls to.
How can I scroll to the right location on the page such that I can successfully select the element?
My relevant code:
import parameters
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('C:\\Users\\Root\\Downloads\\chromedriver.exe')
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
sleep(0.5)
username = driver.find_element_by_name('session_key')
username.send_keys(parameters.linkedin_username)
sleep(0.5)
password = driver.find_element_by_name('session_password')
password.send_keys(parameters.linkedin_password)
sleep(0.5)
sign_in_button = driver.find_element_by_xpath('//button[#class="btn__primary--large from__button--floating"]')
sign_in_button.click()
driver.get('https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk')
loadmore_skills=driver.find_element_by_xpath('//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]')
actions = ActionChains(driver)
actions.move_to_element(loadmore_skills).perform()
#actions.move_to_element_with_offset(loadmore_skills, 0, 0).perform()
loadmore_skills.click()
After playing around with it, I seem to have figured out where the problem is stemming from. The error
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]"}
(Session info: chrome=81.0.4044.113)
always correctly states the problem its encountering and as such it's not able to find the element. The possible causes of this include:
Element not present at the time of execution
Dynamically generated
content Conflicting names
In your case, it was the second point. As the content that is displayed is loaded dynamically as you scroll down. So When it first loads your profile the skills sections aren't actually present in the DOM. So to solve this, you simply have to scroll to the section so that it gets applied in the DOM.
This line is the trick here. It will position it to the correct panel and thus loading and applying the data to the DOM.
driver.execute_script("window.scrollTo(0, 1800)")
Here's my code (Please change it as necessary)
from time import sleep
# import parameters
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('../chromedriver.exe')
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
sleep(0.5)
username = driver.find_element_by_name('session_key')
username.send_keys('')
sleep(0.5)
password = driver.find_element_by_name('session_password')
password.send_keys('')
sleep(0.5)
sign_in_button = driver.find_element_by_xpath('//button[#class="btn__primary--large from__button--floating"]')
sign_in_button.click()
driver.get('https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk')
sleep(3)
# driver.execute_script("window.scrollTo(0, 1800)")
sleep(3)
loadmore_skills=driver.find_element_by_xpath('//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]')
actions = ActionChains(driver)
actions.move_to_element(loadmore_skills).perform()
#actions.move_to_element_with_offset(loadmore_skills, 0, 0).perform()
loadmore_skills.click()
Output
Update
In concerns to your newer problem, you need to implement a continuous scroll method that would enable you to dynamically update the skills section. This requires a lot of change and should ideally be asked as a another question.
I have also found a simple solution by setting the scroll to the correct threshold. For y=3200 seems to work fine for all the profiles I've checked including yours, mine and few others.
driver.execute_script("window.scrollTo(0, 3200)")
If the button is not visible on the page at the time of loading then use the until method to delay the execution
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Button is rdy!"
except TimeoutException:
print "Loading took too much time!"
Example is taken from here
To get the exact location of the element, you can use the following method to do so.
element = driver.find_element_by_id('some_id')
element.location_once_scrolled_into_view
This actually intends to return you the coordinates (x, y) of the element on-page, but also scroll down right to target element. You can then use the coordinates to make a click on the button. You can read more on that here.
You are getting NoSuchElementException error when the locators (i.e. id / xpath/name/class_name/css selectors etc) we mentioned in the selenium program code is unable to find the web element on the web page.
How to resolve NoSuchElementException:
Apply WebDriverWait : allow webdriver to wait for a specific time
Try catch block
so before performing action on webelement you need to take web element into view, I have removed unwated code and also avoided use of hardcoded waits as its not good practice to deal with synchronization issue. Also while clicking on show more button you have to scroll down otherwise it will not work.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path="path of chromedriver.exe")
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
driver.maximize_window()
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.NAME, "session_key"))).send_keys("email id")
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.NAME, "session_password"))).send_keys("password ")
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//button[#class='btn__primary--large from__button--floating']"))).click()
driver.get("https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk")
driver.maximize_window()
driver.execute_script("scroll(0, 250);")
buttonClick = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//span[text()='Show more']")))
ActionChains(driver).move_to_element(buttonClick).click().perform()
Output:

Wait until page is loaded with Selenium WebDriver for Python

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.
The webdriver will wait for a page to load by default via .get() method.
As you may be looking for some specific element as #user227215 said, you should use WebDriverWait to wait for an element located in your page:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.
Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get('url')
timeout = 5
try:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
except TimeoutException:
print "Timed out waiting for page to load"
This matches the example in the documentation. Here is a link to the documentation for By.
Find below 3 methods:
readyState
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of method:
#contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.
As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:
element_present = EC.presence_of_element_located((By.ID, 'element_id'))
WebDriverWait(driver, timeout).until(element_present)
It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here.
According to Web Scraping with Python by Ryan Mitchell:
ID
Used in the example; finds elements by their HTML id attribute
CLASS_NAME
Used to find elements by their HTML class attribute. Why is this
function CLASS_NAME not simply CLASS? Using the form object.CLASS
would create problems for Selenium's Java library, where .class is a
reserved method. In order to keep the Selenium syntax consistent
between different languages, CLASS_NAME was used instead.
CSS_SELECTOR
Finds elements by their class, id, or tag name, using the #idName,
.className, tagName convention.
LINK_TEXT
Finds HTML tags by the text they contain. For example, a link that
says "Next" can be selected using (By.LINK_TEXT, "Next").
PARTIAL_LINK_TEXT
Similar to LINK_TEXT, but matches on a partial string.
NAME
Finds HTML tags by their name attribute. This is handy for HTML forms.
TAG_NAME
Finds HTML tags by their tag name.
XPATH
Uses an XPath expression ... to select matching elements.
From selenium/webdriver/support/wait.py
driver = ...
from selenium.webdriver.support.wait import WebDriverWait
element = WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_id("someId"))
On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True
Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.
Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue
Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.
import time
from selenium import webdriver
def page_has_loaded(driver, sleep_time = 2):
'''
Waits for page to completely load by comparing current page hash values.
'''
def get_page_hash(driver):
'''
Returns html dom hash
'''
# can find element by either 'html' tag or by the html 'root' id
dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
# dom = driver.find_element_by_id('root').get_attribute('innerHTML')
dom_hash = hash(dom.encode('utf-8'))
return dom_hash
page_hash = 'empty'
page_hash_new = ''
# comparing old and new page DOM hash together to verify the page is fully loaded
while page_hash != page_hash_new:
page_hash = get_page_hash(driver)
time.sleep(sleep_time)
page_hash_new = get_page_hash(driver)
print('<page_has_loaded> - page not loaded')
print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))
How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"
You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")
use this in code :
from selenium import webdriver
driver = webdriver.Firefox() # or Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.......")
or you can use this code if you are looking for a specific tag :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox() #or Chrome()
driver.get("http://www.......")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "tag_id"))
)
finally:
driver.quit()
Very good answers here. Quick example of wait for XPATH.
# wait for sizes to load - 2s timeout
try:
WebDriverWait(driver, 2).until(expected_conditions.presence_of_element_located(
(By.XPATH, "//div[#id='stockSizes']//a")))
except TimeoutException:
pass
I struggled a bit to get this working as that didn't worked for me as expected. anyone who is still struggling to get this working, may check this.
I want to wait for an element to be present on the webpage before proceeding with my manipulations.
we can use WebDriverWait(driver, 10, 1).until(), but the catch is until() expects a function which it can execute for a period of timeout provided(in our case its 10) for every 1 sec. so keeping it like below worked for me.
element_found = wait_for_element.until(lambda x: x.find_element_by_class_name("MY_ELEMENT_CLASS_NAME").is_displayed())
here is what until() do behind the scene
def until(self, method, message=''):
"""Calls the method provided with the driver as an argument until the \
return value is not False."""
screen = None
stacktrace = None
end_time = time.time() + self._timeout
while True:
try:
value = method(self._driver)
if value:
return value
except self._ignored_exceptions as exc:
screen = getattr(exc, 'screen', None)
stacktrace = getattr(exc, 'stacktrace', None)
time.sleep(self._poll)
if time.time() > end_time:
break
raise TimeoutException(message, screen, stacktrace)
If you are trying to scroll and find all items on a page. You can consider using the following. This is a combination of a few methods mentioned by others here. And it did the job for me:
while True:
try:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem1 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_1 = len(elem1)
print(f"A list Length {len_elem_1}")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.implicitly_wait(30)
time.sleep(4)
elem2 = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "element-name")))
len_elem_2 = len(elem2)
print(f"B list Length {len_elem_2}")
if len_elem_1 == len_elem_2:
print(f"final length = {len_elem_1}")
break
except TimeoutException:
print("Loading took too much time!")
selenium can't detect when the page is fully loaded or not, but javascript can. I suggest you try this.
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
this will execute javascript code instead of using python, because javascript can detect when page is fully loaded, it will show 'complete'. This code means in 100 seconds, keep tryingn document.readyState until complete shows.
nono = driver.current_url
driver.find_element(By.XPATH,"//button[#value='Send']").click()
while driver.current_url == nono:
pass
print("page loaded.")

Categories