python selenium firefox behavior - python

I am using a firefox browser with selenium. I am scraping a website that has multiple pages like google search, where you can pick the page at the bottom. On each page, I click an element, like google again, and scrap data from that element's information. If I am at an element's information on the third page, and click the back button using my regular firefox browser, it goes back to the third page. But, when I press the back button in selenium with driver.back(), it takes me back to the first page. Anyone know how to fix this?
count = 1
while 1:
try:
pages = driver.find_elements_by_css_selector("a.page-number.gradient")
except:
break
for page in pages:
if page.text==str(count):
page.click()
print count
break
states = driver.find_elements_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr/td[19]")
fails = []
i = 1
for state in states:
if state.text == "FAILED":
fails.append(i)
i+=1
for fail in fails:
print driver.find_element_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr[" + str(fail) + "]/td[19]").text
driver.find_element_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr[" + str(fail) + "]/td[1]/input").click()
time.sleep(2)
errors = driver.find_element_by_name("errors")
if "\n" in errors.text:
fixedText = errors.text.split("\n")[0]
errors.clear()
errors.send_keys(fixedText)
time.sleep(1)
driver.find_element_by_name('post_type').click()
time.sleep(5)
driver.switch_to_alert().accept()
driver.switch_to_alert().accept()
driver.back()
driver.back()
else:
driver.back()
driver.switch_to_alert().accept()
driver.switch_to_alert().accept()
count+=1
The code is really complicated, but basically it's the driver.back() lines that aren't working

Related

I use selenium(python) to retrieve some data from WOS top papers, but when I use click() to open the sub link, I can only open the first url

My task is to open each url from the following website and retrieve some evaluation data for each essay. I have located the element successfully, which means I get 10 element. However, when selenium began to imitate human to click the url, it can only open the first link of ten links.
https://esi.clarivate.com/DocumentsAction.action
HTML:
The code is as followed.
import time
from selenium import webdriver
driver=webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get('https://esi.clarivate.com/IndicatorsAction.action?Init=Yes&SrcApp=IC2LS&SID=H3-M1jrs4mSS2O3WTFbtdrUJugtDvogGRIM-18x2dx2B1ubex2Bo9Y5F6ZPQtUZbfUAx3Dx3Dp1StTsneXx2B7vu85UqXoaoQx3Dx3D-03Ff2gF3hTJGBPDScD1wSwx3Dx3D-cLUx2FoETAVeN3rTSMreq46gx3Dx3D')
#add filter-> research fields-> "clinical medicine"
target = driver.find_element_by_id("ext-gen1065")
time.sleep(1)
target.click()
time.sleep(1)
n = driver.window_handles
driver.switch_to.window(n[-1])
links=driver.find_elements_by_class_name("docTitle")
length=len(links)
for i in range(0,length):
item=links[i]
item.click()
time.sleep(1)
handles=driver.window_handles
index_handle=driver.current_window_handle
for handle in handles:
if handle != index_handle:
driver.switch_to.window(handle)
else:
continue
time.sleep(1)
u1=driver.find_elements_by_class_name("large-number")[2].text
u2=driver.find_elements_by_class_name("large-number")[3].text
print(u1,u2)
print("\n")
driver.close()
time.sleep(1)
driver.switch_to_window(index_handle)
driver.quit()
print("————finished————")
The error page:
And I try to find out the problem by testing these code:
links=driver.find_elements_by_class_name("docTitle")
length=len(links)
print(length)
print(links[1].text)
#links[0].click()
links[1].click()
The result is:
which means it had already find the element, but failed to open it.(when using links[0].text, it works fine.)
Any idea about this?

Blocking login overlay window when scraping web page using Selenium

I am trying to scrape a long list of books in 10 web pages. When the loop clicks on next > button for the first time the website displays a login overlay so selenium can not find the target elements.
I have tried all the possible solutions:
Use some chrome options.
Use try-except to click X button on the overlay. But it appears only one time (when clicking next > for the first time). The problem is that when I put this try-except block at the end of while True: loop, it became infinite as I use continue in except as I do not want to break the loop.
Add some popup blocker extensions to Chrome but they do not work when I run the code although I add the extension using options.add_argument('load-extension=' + ExtensionPath).
This is my code:
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('disable-avfoundation-overlays')
options.add_argument('disable-internal-flash')
options.add_argument('no-proxy-server')
options.add_argument("disable-notifications")
options.add_argument("disable-popup")
Extension = (r'C:\Users\DELL\AppData\Local\Google\Chrome\User Data\Profile 1\Extensions\ifnkdbpmgkdbfklnbfidaackdenlmhgh\1.1.9_0')
options.add_argument('load-extension=' + Extension)
options.add_argument('--disable-overlay-scrollbar')
driver = webdriver.Chrome(options=options)
driver.get('https://www.goodreads.com/list/show/32339._50_?page=')
wait = WebDriverWait(driver, 2)
review_dict = {'title':[], 'author':[],'rating':[]}
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
prod_containers = html_soup.find_all('table', class_ = 'tableList js-dataTooltip')
while True:
table = driver.find_element_by_xpath('//*[#id="all_votes"]/table')
for product in table.find_elements_by_xpath(".//tr"):
for td in product.find_elements_by_xpath('.//td[3]/a'):
title = td.text
review_dict['title'].append(title)
for td in product.find_elements_by_xpath('.//td[3]/span[2]'):
author = td.text
review_dict['author'].append(author)
for td in product.find_elements_by_xpath('.//td[3]/div[1]'):
rating = td.text[0:4]
review_dict['rating'].append(rating)
try:
close = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div/div/div[1]/button')))
close.click()
except NoSuchElementException:
continue
try:
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'next_page')))
element.click()
except TimeoutException:
break
df = pd.DataFrame.from_dict(review_dict)
df
Any help like if I can change the loop to for loop clicks next > button until the end rather than while loop or where should I put try-except block to close the overlay or if there is Chromeoption can disable overlay.
Thanks in advance
Thank you for sharing your code and the website that you are having trouble with. I was able to close the Login Modal by using xpath. I took this challenge and broke up the code using class objects. 1 object is for the selenium.webdriver.chrome.webdriver and the other object is for the page that you wanted to scrape the data against ( https://www.goodreads.com/list/show/32339 ). In the following methods, I used the Javascript return arguments[0].scrollIntoView(); method and was able to scroll to the last book that displayed on the page. After I did that, I was able to click the next button
def scroll_to_element(self, xpath : str):
element = self.chrome_driver.find_element(By.XPATH, xpath)
self.chrome_driver.execute_script("return arguments[0].scrollIntoView();", element)
def get_book_count(self):
return self.chrome_driver.find_elements(By.XPATH, "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr").__len__()
def click_next_page(self):
# Scroll to last record and click "next page"
xpath = "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr[{0}]".format(self.get_book_count())
self.scroll_to_element(xpath)
self.chrome_driver.find_element(By.XPATH, "//div[#id='all_votes']//div[#class='pagination']//a[#class='next_page']").click()
Once I clicked on the "Next" button, I saw the modal display. I was able to find the xpath for the modal and was able to close the modal.
def is_displayed(self, xpath: str, int = 5):
try:
webElement = DriverWait(self.chrome_driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
return True if webElement != None else False
except:
return False
def is_modal_displayed(self):
return self.is_displayed("//body[#class='modalOpened']")
def close_modal(self):
self.chrome_driver.find_element(By.XPATH, "//div[#class='modal__content']//div[#class='modal__close']").click()
if(self.is_modal_displayed()):
raise Exception("Modal Failed To Close")
I hope this helps you to solve your problem.

How to stop click the same button while the button always exist by selenium

I faced one issue, before when I scrape multiple pages by Selenium, just use click next page button and use NoSuchElementException to stop it.
But the url I facing now is the element always exists, in the last page, if I click next page button, it just reload the current page.
Anyone Can help to solve how to stop click the same button?
items=driver.find_elements_by_class_name('item')
while True:
try:
#click next page
driver.find_element_by_link_text('下一页').click()
sleep(5)
#scrpae data here
items=driver.find_elements_by_class_name('item')
for i in range(0, len(items)):
results.append(items[i])
print(items[i])
except NoSuchElementException:
break
For the pages details you can check the picture below
Fullsize image
[Edited]
You can solve it by matching current page url and next page url in next page link.
if current page url matches the url in next page link then it is the last page. otherwise continue scraping.
You should have a variable where you store current page URL and when you click on next page link by selenium, you get the page url and match with previous.
This is what i am saying:
url = "https://humkinar.com.pk/"
driver.get(url)
items=driver.find_elements_by_class_name('item')
current_page_url = ""
prev_page_url = url
while True:
try:
driver.find_element_by_link_text('下一页').click()
current_page_url = driver.current_url
if current_page_url != prev_page_url:
time.sleep(5)
items=driver.find_elements_by_class_name('item')
for i in range(0, len(items)):
results.append(items[i])
print(items[i])
prev_page_url = current_page_url
else:
break
except NoSuchElementException:
break
As i see in picture (i suppose picture you shared is of last page),check for className == 'disable' in <a class='disable'> <some text in chinese></a> and break;
UPDATE:
items=driver.find_elements_by_class_name('item')
while True:
try:
#click next page
next = driver.find_element_by_link_text('下一页')
next.click()
sleep(5)
#scrpae data here
items=driver.find_elements_by_class_name('item')
for i in range(0, len(items)):
results.append(items[i])
print(items[i])
break;
if(next.getAttribute('class') == 'disable'){
throw new Exception()
}

Selenium click on a next-page link not loading the next page

I'm new to selenium and webscraping and I'm trying to get information from the link: https://www.carmudi.com.ph/cars/civic/distance:50km/?sort=suggested
Here's a snippet of the code I'm using:
while max_pages > 0:
results.extend(extract_content(driver.page_source))
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
max_pages -= 1
When I try to print results, I always get (max_pages) of the same results from page 1. The "Next page" button is visible in the page and when I try to find elements of the same class, it only shows 1 element. When I try getting the element by the exact xpath and performing the click action on it, it doesn't work as well. I enclosed it in a try-except block but there were no errors. Why might this be?
You are making this more complicated than it needs to be. There's no point in using JS clicks here... just use the normal Selenium clicks.
while True:
# do stuff on the page
next = driver.find_element_by_css_selector("a[title='Next page']")
if next
next.click()
else
break
replace:
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
with:
driver.execute_script('next = document.querySelector(".next-page"); next.click();')
If you try next = document.querySelector(".next-page"); next.click(); in console you can see it works.

How to save state of selenium web driver in python?

I am trying to scrape this website: http://www.infoempleo.com/ofertas-internacionales/.
I wanted to scrape by selecting the "Last 15 days" radio button. So I wrote this code.
browser = webdriver.Chrome('C:\Users\Junaid\Downloads\chromedriver\chromedriver_win32\chromedriver.exe')
new_urls = deque(['http://www.infoempleo.com/ofertas-internacionales/'])
processed_urls = set()
while len(new_urls):
print "------ URL LIST -------"
print new_urls
print "-----------------------"
print
time.sleep(5)
url = new_urls.popleft()
processed_urls.add(url)
try:
print "----------- Scraping ==>",url
browser.get(url)
elem = browser.find_elements_by_id("fechapublicacion")[-1]
if ( elem.is_selected() ):
print "already selected"
else:
elem.click()
html = browser.page_source
except:
print "-------- Failed to Scrape, Moving to Next"
continue
soup = BeautifulSoup(html)
I have been able to select the radio button and scrape the first page.
There is a list of pages at the end like 1, 2, 3..
When moving to the next page, 'browser.get(url)' is called which resets the radio button to 'Any Date' instead of 'Last 15 Days'. Which makes the code execute the else statement else: elem.click() to select the radio button again, which open the first page that has been already scraped.
Is there a way around this? Help will be appreciated.
I have found a work around this problem. Instead of saving links to next pages in a list. I am selecting the nextPage button/element and using .click(). This way the browser.get(url) is not needed to call again and the page is not reloaded.

Categories