Selenium: wait until text NOT to be present in element - python

I want to extract the article titles from a webpage with a multi-page list of articles.
I get the article titles on the first page using:
titles = browser.find_elements_by_xpath(r'path')
for i in range(len(titles)):
titles_list.append(titles[i].text)
I navigate to the next page using:
next_page = browser.find_element_by_xpath(r'path')
next_page.click()
Then, I return to the first step (i.e. getting the article titles).
The problem is, using the codes above, I sometimes get the article titles of a page twice and I sometimes miss the article titles of a page.
I believe the solution is to wait until the page fully loads after the second step and before repeating the first step: I should store something unique to the first page (e.g. the first article's title) in a variable (e.g. 'first_item'), and I should wait until the corresponding element does not contain that text.
I found the answer to my question but in Java which used ExpectedConditions.not, but the following code (the EC.not() part) is not valid in Python and raise a SyntaxError:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
next_page.click()
wait = WebDriverWait(browser, 10)
wait.until(EC.not(EC.text_to_be_present_in_element((By.XPATH, r'path'), first_item)))
How can I wait until a text is not present in an element in Python?

You Can wait like this
element = WebDriverWait(driver, 6).until_not(EC.element_to_be_clickable((By.XPATH, 'xpath')))
while element == True:
try:
element.click()
except:
pass
However it is weird looking but It will wait until the element is found else loop will continue

Related

Unable to click Next button using selenium as number of pages are unknown

I am new to selenium and trying to scrape:-
https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/
I need all the details mentioned on this page an others as well.
Also, there are certain more pages containing the same information, need to scrape them as well. I try to scrape by making changes to the target URL:-
https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/40
but the last item is changing and is not even similar to the page number. Page number 3 is having 40 at the end and page number 5:-
https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/80
so not able to get the data through that.
Here is my code:-
def extract_url():
url = driver.find_elements(By.XPATH,"//h2[#class='resultTitle']//a")
for i in url:
dist.append(i.get_attribute("href"))
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
driver.find_element(By.XPATH,"//li[#class='btnNextPre']//a").click()
for _ in range(10):
extract_url()
working fine till page 5 but not after that. Could you please suggest how can I iterate over pages where the we don't know the number of pages and can extract data till teh last page.
You need the check the pagination link is disabled. Use infinite loop and check for pagination button is disabled.
Use WebDriverWait() and wait for visibility of the element.
Code:
driver.get("https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/")
counter=1
while(True):
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h2.resultTitle >a")))
urllist=[item.get_attribute('href') for item in driver.find_elements(By.CSS_SELECTOR, "h2.resultTitle >a")]
print(urllist)
print("Page number :" +str(counter))
driver.execute_script("arguments[0].click();", driver.find_element(By.CSS_SELECTOR, "ul.pagination >li.btnNextPre>a"))
#check for pagination button disabled
if len(driver.find_elements(By.XPATH, "//li[#class='disabled']//a[text()='>']"))>0:
print("pagination not found!!!")
break
time.sleep(2) #To slowdown the loop
counter=counter+1
import below libraries.
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time

How to fix "stale element reference: element is not attached to the page document" while _scraping_ with selenium

Good evening modern-day heroes, hope everyone's safe and sound !
What I'm hoping to achieve with this selenium script is to load up the page, click on BTC, ETH, XRP icons to filter results, then keep clicking the "show more" button until the max number of elements have been loaded --> 1138, then to obtain all the hrefs of those 1138 companies, click on each and visit their respective pages, then scrape further data points located on each internal page visited
With that said, I've tried lots of different approaches including just to print the link of each company which it worked, however, it fails to properly go/visit the extracted hrefs and says ("stale element reference: element is not attached to the page document").
Heard that explicit/implicit waits could help to fix this, but I can't seem to wrap my head around how to use it with the variable links particularly which is where the code stops to give me the error aforementioned
Have a feeling that the issue is with the while loop and how it processes the fact that I'm looping through a list of links that will be visited next. Can't emphasize how grateful I'll be if someone can guide me in the right direction !!
from selenium.webdriver import Chrome
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
from selenium.common.exceptions import NoSuchElementException, ElementNotVisibleException
webdriver = '/Users/karimnabil/projects/selenium_js/chromedriver-1'
driver = Chrome(webdriver)
url = 'https://acceptedhere.io/catalog/company/'
driver.get(url)
btc = driver.find_element_by_xpath("//ul[#role='currency-list']/li[1]/a")
btc.click()
eth = driver.find_element_by_xpath("//ul[#role='currency-list']/li[2]/a")
eth.click()
xrp = driver.find_element_by_xpath("//ul[#role='currency-list']/li[5]/a")
xrp.click()
all_categories = driver.find_element_by_xpath("//div[#class='dropdownMenu']/ul/li[1]")
all_categories.click()
time.sleep(1)
maximun_number = 1138
while True:
show_more = driver.find_element_by_xpath("//div[#class='row search-result']/div[3]/button")
elements = driver.find_elements_by_xpath("//div[#class='row desktop-results mobile-hide']/div")
if len(elements) > maximun_number:
break
show_more.click()
time.sleep(1)
for element in elements:
links = element.find_elements_by_xpath(".//div/div/div[2]/div/div/div[1]/a")
links = [url.get_attribute('href') for url in links]
time.sleep(0.5)
for link in links:
driver.get(link)
company_title = driver.find_element_by_xpath("//h3").text
print(company_title)
When you navigate through pages the elements you put in you variables (e.g. show_more ) becomes stale or stateless since you are on a different page. It may seem you need to wait for an element to load or to be clickable. Here are some examples:
https://seleniumbyexamples.github.io/waitclickable
https://seleniumbyexamples.github.io/waitvisibility

How to Skip a Webpage After a Period of Time Selenium

I am parsing a file with a ton of colleges. Selenium googles "Admissions " + college_name then clicks the first link and gets some data from each page. The issue is that the list of college names I am pulling from is very rough (technically a list of all accredited institutions in America), so some of the links are broken or get stuck in a load loop. How do I set some sort of timer that basically says
if page load time > x seconds:
go to next element in list
You could invoke WebDriverWait on the page, and if the page catches a TimeoutException then you will know it took too long to load, so you can proceed to the next one.
Given you do not know what each page HTML will look like, this is a very challenging problem.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# list of college names
names = []
for name in names:
# search for the college here
# get list of search results
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#class='rc']")))
search_results = driver.find_elements_by_xpath("//div[#class='rc']")
# get first result
search_result = search_results[0]
# attempt to load the page
try:
search_result.click()
except TimeoutException:
# click operation should time out if next page does not load
# pass to move on to next URL
pass
This is a very rough, general outline. As I mentioned, without knowing what the expected page title will be, or what the expected page content will look like, it's incredibly difficult to write a generic method that will successfully accomplish this. This code is meant to be just a starting point for you.

How to loop from a list of urls by clicking the xpath and extract data using Selenium in Python?

I am extracting board members from a list of URLs. For each url in the URL_lst, click the first xpath (ViewMore to expand the list), then extract values from the second xpath (BoardMembers' info).
Below are the three companies I want to extract info: https://www.bloomberg.com/quote/FB:US, https://www.bloomberg.com/quote/AAPL:US, https://www.bloomberg.com/quote/MSFT:US
My code is shown below but doesn't work. The Output list is not aggregated. I know sth wrong with the loop but don't know how to fix it. Can anyone tell me how to correct the code? Thanks!
URL_lst = ['https://www.bloomberg.com/quote/FB:US','https://www.bloomberg.com/quote/AAPL:US','https://www.bloomberg.com/quote/MSFT:US']
Outputs = []
driver = webdriver.Chrome(r'xxx\chromedriver.exe')
for url in URL_lst:
driver.get(url)
for c in driver.find_elements_by_xpath("//*[#id='root']/div/div/section[3]/div[10]/div[2]/div/span[1]"):
c.click()
for e in (c.find_elements_by_xpath('//*[#id="root"]/div/div/section[3]/div[10]/div[1]/div[2]/div/div[2]')[0].text.split('\n'):
Outputs.append(e)
print(Outputs)
Based on the URLs you provided, I did some refactoring for you. I added wait on each item you are trying to click and a scrollIntoView Javascript call to scroll down to the View More button. You were originally clicking View More buttons in a loop, but your XPath only returned 1 element, so the loop was redundant.
I also refactored your selector for board members to query directly on the div element containing their names. Your original query was finding a div several levels above the actual name text, which is why your Outputs list was returning empty.
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
URL_lst = ['https://www.bloomberg.com/quote/FB:US','https://www.bloomberg.com/quote/AAPL:US','https://www.bloomberg.com/quote/MSFT:US']
Outputs = []
driver = webdriver.Chrome(r'xxx\chromedriver.exe')
wait = WebDriverWait(driver, 30)
for url in URL_lst:
driver.get(url)
# get "Board Members" header
board_members_header = wait.until(EC.presence_of_element_located((By.XPATH, "//h2[span[text()='Board Members']]")))
# scroll down to board members
driver.execute_script("arguments[0].scrollIntoView();", board_members_header)
# get view more button
view_more_button = wait.until(EC.presence_of_element_located((By.XPATH, "//section[contains(#class, 'PageMainContent')]/div/div[2]/div/span[span[text()='View More']]")))
# click view more button
view_more_button.click()
# wait on 'View less' to exist, meaning list is expanded now
wait.until(EC.presence_of_element_located((By.XPATH, "//section[contains(#class, 'PageMainContent')]/div/div[2]/div/span[span[text()='View Less']]")))
# wait on visibility of board member names
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(#class, 'boardWrap')]//div[contains(#class, 'name')]")))
# get list of board members names
board_member_names = driver.find_elements_by_xpath("//div[contains(#class, 'boardWrap')]//div[contains(#class, 'name')]")
for board_member in board_member_names:
Outputs.append(board_member.text)
# explicit sleep to avoid being flagged as bot
sleep(5)
print(Outputs)
I also added an explicit sleep between URL grabs, so that Bloomberg does not flag you as a bot.

Selenium for Python - text_to_be_present for partial text matching

I have a div which contains the results for a certain search query. The text contained in this div changes as a button to go to the next page is clicked.
In the text contained in this div, there is also the corresponding number of the page. After clicking in the button to go to the next page, the results still take a bit to load, so I want to make the driver wait the content to load.
As such, I want to wait until the string "Page x" appears inside the div, where x corresponds to the number of the next page to be loaded.
For this, I tried:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
# Search page
driver.get('http://searchpage.com')
timeout = 120
current_page = 1
while True:
try:
# Wait until the results have been loaded
WebDriverWait(driver, timeout).until(
EC.text_to_be_present_in_element(
locator=(By.ID, "SEARCHBASE"),
text_="Page {:}".format(current_page)))
# Click to go to the next page. This calls some javascript.
driver.find_element_by_xpath('//a[#href="#"]').click
current_page += 1
except:
driver.quit()
Although, this always fails to match the text. What am I doing wrong here?
To just detect if anything whatsoever had changed in the page would also do the job, but I haven't found any way to do that.
Try to apply below solution to wait for partial text match:
WebDriverWait(driver, timeout).until(lambda driver: "Page {:}".format(current_page) in driver.find_element(By.ID, "SEARCHBASE").text)

Categories