hi I'm new to web scraping and have been trying to use Selenium to scrape a forum in python
I am trying to get Selenium to click "Next" until the last page but I am not sure how to break the loop. and I having trouble with the locator:
When I locate the next button by partial link, the automated clicking will continue to next thread e.g page1->page2->next thread->page1 of next thread-->page2 of next thread
while True:
next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Next")))
next_link.click()
When I locate the next button by class name, the automated clicking will click "prev" button when it reaches the last page
while True:
next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "prevnext")))
next_link.click()
My questions are:
Which locator should I use? (by class or by partial link or any
other suggestion?
How do I break the loop so it stops clicking when it reaches the
last page?
There are a couple of things you need to consider as follows :
There are two elements on the page with text as Next one on Top and another at the Bottom, so you need to decide with which element you desire to interact and construct a unique Locator Strategy
Moving forward as you want to invoke click() on the element instead of expected-conditions as presence_of_element_located() you need to use element_to_be_clickable().
When there would be no element with text as Next you need to execute the remaining steps, so invoke the click() within try-catch block and incase of an exception break out.
As per your requirement xpath as a Locator Strategy looks good to me.
Here is the working code block :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://forums.hardwarezone.com.sg/money-mind-210/hdb-fully-paid-up-5744914.html")
driver.find_element_by_xpath("//a[#id='poststop' and #name='poststop']//following::table[1]//li[#class='prevnext']/a").click()
while True:
try :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[#id='poststop' and #name='poststop']//following::table[1]//li[#class='prevnext']/a[contains(.,'Next')]"))).click()
except :
print("No more pages left")
break
driver.quit()
Console Output :
No more pages left
You can use below code to click Next button until the last page reached and break the loop if the button is not present:
from selenium.common.exceptions import TimeoutException
while True:
try:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ›"))).click()
except TimeoutException:
break
You can use any locator which gives unique identification. Best practices says the following order.
Id
Name
Class Name
Css Selector
Xpath
Others
The come out of the while loop when it is not find the element you can use try block as given below. the break command is used for the same.
while True:
try:
next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "prevnext")))
next_link.click()
except TimeoutException:
break
Related
I'm trying to program a sequence of events which are dependent on the last in selenium. First click login which loads a new page, then click a scrollbox on that page, then click a button inside the scrollbox which won't be loaded until the scrollbox has been clicked.
I am trying to stop using time.sleep(x) as I've read this is bad practice and I'm trying to learn more about how selenium works.
The code I've got that doesn't work is
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as ec
login_box = WebDriverWait(driver, 5).until(
ec.element_to_be_clickable((By.XPATH, "//*[#id=\"loginAppRoot\"]/div[1]/div[1]/button/span")))
login_box.click()
print("login")
scroll_box = WebDriverWait(driver, 5).until(
ec.element_to_be_clickable((By.XPATH, "//*[#id=\"searchBarFilterDropdown\"]")))
scroll_box.click()
box_inside_scroll = WebDriverWait(driver, 5).until(
ec.element_to_be_clickable((By.XPATH, "//*[#id=\"global-header\"]/nav[1]/div[2]/div/div[1]/ul/li[44]/a")))
box_inside_scroll.click()
The only way that I can get this to work is to put a time.sleep(2) before scroll_box.click(). From my understanding the webdriver wait and expected condition should negate the need for me to use time.sleep. Could anyone help me to remove the pre-defined wait times?
If you need selenium to wait longer you just have to change the timeout. If all you need is 2 seconds more then try
scroll_box = WebDriverWait(driver, 7).until(
ec.element_to_be_clickable((By.XPATH, "//*[#id=\"searchBarFilterDropdown\"]")))
scroll_box.click()
when using WebDriverWait, in the parenthesis, your first input is your driver, second is the amount of time you want selenium to wait before it times out. Instead of 7 you can try 10, so it waits 10 seconds before attempting to click the scroll box.
Changing the timeout doesn't have any effect to me so here's another answer.
Not a good conduct, but it saves you time:
I want to click on the element "Circuit Breaker control", I need the page to be loaded, so I do
page_loaded = False
while not page_loaded :
try:
WebDriverWait(self.driver, 40).until(EC.element_to_be_clickable((By.LINK_TEXT, "Circuit Breaker control"))).click()
page_loaded = True
except:
time.sleep(1)
The error I get when it is not waiting enough time:
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a data-toggle="tab" onclick="getSelectedTab('tab_cblbs_position')" href="#tab_cblbs_position">...</a> is not clickable at point (188, 254). Other element would receive the click: <div class="waitMe" data-waitme_id="100" style="background: rgba(255, 255, 255, 0.7);">...</div>
I hope it helps.
I have a script that scrapes from 10 multiple pages at once.
#hyperlink_list is the list of the pages
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
for i in range(0,10):
url = hyperlink_list[i]
sleep(randint(10, 24))
driver.get(url)
time.sleep(10)
soup = BeautifulSoup(driver.page_source, 'html.parser')
Now from the pages, I am extracting this part:
In only some pages, there is the show more link where the description is longer. I want to click this link, and extract the description whenever the show more link is available.
Code for show more link:
<a id="rfq-info-header-description-showmorebutton">
show more
</a>
I want to click this link only if it's available, otherwise it will show element not found error.
Use more = driver.find_element_by_id("rfq-info-header-description-showmorebutton") (assuming the more link can always be found using this id). If the more button is not found, this will throw an exception. (see here for details)
You should try-except block and we should look for show more web element. Below I am using find_elements (plural) and len() to get the size, if >0 then web element must be present and then trying to click on it using Explicit waits.
If size is not >0 then show more should not be visible and I am just printing a simple print statement in that block.
Code :
try:
if len(driver.find_elements(By.XPATH, "//a[#id='rfq-info-header-description-showmorebutton']")) >0 :
print("Show more link is available so Selenium bot will click on it.")
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[#id='rfq-info-header-description-showmorebutton']"))).click()
print('Clicked on show more link')
else:
print("Show more link is not available")
except:
print('Something else went wrong.')
pass
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I am trying to scrape information on a website where the information is not immediately present. When you click a certain button, the page begins to load new content on the bottom of the page, and after it's done loading, red text shows up as "Assists (At Least)". I am able to find the first button "Go to Prop builder", which doesn't immediately show up on the page, but after the script clicks the button, it times out when trying to find the "Assists (At Least)" text, in spite of the script sleeping and being present on the screen.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://www.bovada.lv/sports/basketball/nba')
# this part succeeds
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, "//span[text()='Go to Prop builder']")
)
)
element.click()
time.sleep(5)
# this part fails
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
time.sleep(2)
innerHTML = driver.execute_script('return document.body.innerHTML')
driver.quit()
soup = BeautifulSoup(innerHTML, 'html.parser')
The problem is the Assist element is under a frame. You need to switch to the frame like this:
frame = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME,"player-props-frame")))
driver.switch_to.frame(frame)
Increase the timeout to confirm the timeout provided is correct, You can also confirm using debug mode. If still issue persist, please check "Assists (At Least)" element do not fall under any frame.
You can also share the DOM and proper error message if issue not resolved.
I have a couple of suggestions you could try,
Make sure that the content loaded at the bottom of the is not in a frame. If it is, you need to switch to the particular frame
Check the XPath is correct, try the XPath is matching from the Developer Console
Inspect the element from the browser, once the Developer console is open, press CTRL +F and then try your XPath. if it's not highlighting check frames
Check if there is are any iframes in the page, search for iframe in the view page source, and if you find any for that field which you are looking for, then switch to that frame first.
driver.switch_to.frame("name of the iframe")
Try adding a re-try logic with timeout, and a refresh button if any on the page
st = time.time()
while st+180>time.time():
try:
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
except:
pass
The content you want is in an iFrame. You can access it by switching to it first, like this:
iframe=driver.find_element_by_css_selector('iframe[class="player-props-frame"]')
driver.switch_to.frame(iframe)
Round brackets are the issue here (at least in some cases...). If possible, use .contains selector:
//*[contains(text(),'Assists ') and contains(text(),'At Least')]
I am having trouble selecting a load more button on a Linkedin page. I receive this error in finding the xpath: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element
I suspect that the issue is that the button is not visible on the page at that time. So I have tried actions.move_to_element. However, the page scrolls just below the element, so that the element is no longer visible, and the same error subsequently occurs.
I have also tried move_to_element_with_offset, but this hasn't changed where the page scrolls to.
How can I scroll to the right location on the page such that I can successfully select the element?
My relevant code:
import parameters
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('C:\\Users\\Root\\Downloads\\chromedriver.exe')
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
sleep(0.5)
username = driver.find_element_by_name('session_key')
username.send_keys(parameters.linkedin_username)
sleep(0.5)
password = driver.find_element_by_name('session_password')
password.send_keys(parameters.linkedin_password)
sleep(0.5)
sign_in_button = driver.find_element_by_xpath('//button[#class="btn__primary--large from__button--floating"]')
sign_in_button.click()
driver.get('https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk')
loadmore_skills=driver.find_element_by_xpath('//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]')
actions = ActionChains(driver)
actions.move_to_element(loadmore_skills).perform()
#actions.move_to_element_with_offset(loadmore_skills, 0, 0).perform()
loadmore_skills.click()
After playing around with it, I seem to have figured out where the problem is stemming from. The error
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]"}
(Session info: chrome=81.0.4044.113)
always correctly states the problem its encountering and as such it's not able to find the element. The possible causes of this include:
Element not present at the time of execution
Dynamically generated
content Conflicting names
In your case, it was the second point. As the content that is displayed is loaded dynamically as you scroll down. So When it first loads your profile the skills sections aren't actually present in the DOM. So to solve this, you simply have to scroll to the section so that it gets applied in the DOM.
This line is the trick here. It will position it to the correct panel and thus loading and applying the data to the DOM.
driver.execute_script("window.scrollTo(0, 1800)")
Here's my code (Please change it as necessary)
from time import sleep
# import parameters
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
ChromeOptions = webdriver.ChromeOptions()
driver = webdriver.Chrome('../chromedriver.exe')
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
sleep(0.5)
username = driver.find_element_by_name('session_key')
username.send_keys('')
sleep(0.5)
password = driver.find_element_by_name('session_password')
password.send_keys('')
sleep(0.5)
sign_in_button = driver.find_element_by_xpath('//button[#class="btn__primary--large from__button--floating"]')
sign_in_button.click()
driver.get('https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk')
sleep(3)
# driver.execute_script("window.scrollTo(0, 1800)")
sleep(3)
loadmore_skills=driver.find_element_by_xpath('//button[#class="pv-profile-section__card-action-bar pv-skills-section__additional-skills artdeco-container-card-action-bar artdeco-button artdeco-button--tertiary artdeco-button--3 artdeco-button--fluid"]')
actions = ActionChains(driver)
actions.move_to_element(loadmore_skills).perform()
#actions.move_to_element_with_offset(loadmore_skills, 0, 0).perform()
loadmore_skills.click()
Output
Update
In concerns to your newer problem, you need to implement a continuous scroll method that would enable you to dynamically update the skills section. This requires a lot of change and should ideally be asked as a another question.
I have also found a simple solution by setting the scroll to the correct threshold. For y=3200 seems to work fine for all the profiles I've checked including yours, mine and few others.
driver.execute_script("window.scrollTo(0, 3200)")
If the button is not visible on the page at the time of loading then use the until method to delay the execution
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Button is rdy!"
except TimeoutException:
print "Loading took too much time!"
Example is taken from here
To get the exact location of the element, you can use the following method to do so.
element = driver.find_element_by_id('some_id')
element.location_once_scrolled_into_view
This actually intends to return you the coordinates (x, y) of the element on-page, but also scroll down right to target element. You can then use the coordinates to make a click on the button. You can read more on that here.
You are getting NoSuchElementException error when the locators (i.e. id / xpath/name/class_name/css selectors etc) we mentioned in the selenium program code is unable to find the web element on the web page.
How to resolve NoSuchElementException:
Apply WebDriverWait : allow webdriver to wait for a specific time
Try catch block
so before performing action on webelement you need to take web element into view, I have removed unwated code and also avoided use of hardcoded waits as its not good practice to deal with synchronization issue. Also while clicking on show more button you have to scroll down otherwise it will not work.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path="path of chromedriver.exe")
driver.get('https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin')
driver.maximize_window()
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.NAME, "session_key"))).send_keys("email id")
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.NAME, "session_password"))).send_keys("password ")
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//button[#class='btn__primary--large from__button--floating']"))).click()
driver.get("https://www.linkedin.com/in/kate-yun-yi-wang-054977127/?originalSubdomain=hk")
driver.maximize_window()
driver.execute_script("scroll(0, 250);")
buttonClick = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//span[text()='Show more']")))
ActionChains(driver).move_to_element(buttonClick).click().perform()
Output:
I've created a script in python to scrape the content populated upon initiating a serach in the search box in google map. My script can generate results by pressing that search button. Now I wish to keep parsing the results by pressing the next button (located at the bottom left) until there is none.
Site address
I'm using this motels in new jersey keyword as search.
I've tried with:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.google.com/maps/search/")
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input#searchboxinput"))).send_keys("motels in new jersey")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#searchbox-searchbutton"))).click()
while True:
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".section-result-content"))):
name = WebDriverWait(item,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3[class='section-result-title'] > span"))).text
print(name)
try:
next_page = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[jsaction$='.nextPage'] > span")))
driver.execute_script("arguments[0].click();",next_page)
except TimeoutException: break
driver.quit()
The above script gives me the same results (from first page) several times no matter how far it goes clicking on that next button.
How can I get the accurate results from next pages?
Here is the logic that should work.
There is a server error (application issue) occurring when navigating through the list, so are waiting for the page to load the information and then check if server error displayed, if not then continue with population of the results.
driver.get("https://www.google.com/maps/search/")
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input#searchboxinput"))).send_keys("motels in new jersey")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#searchbox-searchbutton"))).click()
while True:
# wait until the information is loaded
wait.until_not(EC.presence_of_element_located((By.XPATH, "//div[#id='searchbox'][contains(#class,'loading')]")))
# check if there is any server error
if len(driver.find_elements_by_xpath("//div[#class='snackbar-message'][contains(.,'error')]"))>0:
# print the error message
print(driver.find_element_by_xpath("//div[#class='snackbar-message'][contains(.,'error')]").text)
# exit the loop
break
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".section-result-content"))):
name = WebDriverWait(item,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3[class='section-result-title'] > span"))).text
print(name)
try:
next_page = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[jsaction$='.nextPage'] > span")))
driver.execute_script("arguments[0].click();",next_page)
except TimeoutException: break
Being in a while True loop, your script does not wait for the next page to be rendered before searching for the name. The locators input#searchboxinput and button#searchbox-searchbutton are still active when the next page is loading. Thus your script will output the same names from the same page for as many iterations as will run before the next page is loaded.
I recommend a wait condition for the page loading, such as the presence of the spinner animation in the top left where the X button usually is. This should pause the execution until the next page is loaded. The div with id searchbox has a show-loading class appear only while that spinner is active. You can use that to determine if the page is still loading.