new to python and selenium.
For fun, I'm scraping a page. I have to click a first button for Comment, and then another button for All comments so I can get them all.
The first click works, but not the second.
I've set a hardcoded scroll, but still not working.
This is the python code I'm working on:
boton = driver.find_element_by_id('tabComments_btn')
boton.click()
wait = WebDriverWait(driver, 100)
from here on, it doesnt work (it scrolls but it says 'elem cant be scrolled into view'
driver.execute_script("window.scrollTo(0, 1300)")
botonTodos= driver.find_element_by_class_name('thread-node-children-load-all-btn')
wait = WebDriverWait(driver, 100)
botonTodos.click()
If I only click the first button, I'm able to scrape the first 10 comments, so this is working.
wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'thread-node-message')))
for elm in driver.find_elements_by_css_selector(".thread-node-message"):
print(elm.text)
This is the part of the HTML I'm stuck in:
Load next 10 comments
Load all comments
Publicar un comentario
There's a whitespace node with the tag #text between each .
Any ideas welcome.
Thanks.
Here are the different options.
#first create the elements ref
load_next_btn = driver.find_element_by_css_selector(".thread-node-children-load-next-btn")
load_all_btn = driver.find_element_by_css_selector(".thread-node-children-load-all-btn")
# scroll to button you are interested (I am scrolling to load_all_btn
# Option 1
load_all_btn.location_once_scrolled_into_view
# Option 2
driver.execute_script("arguments[0].scrollIntoView();",load_all_btn)
# Option 3
btnLoctation = load_all_btn.location
driver.execute_script("window.scrollTo(" + str(btnLoctation['x']) + "," + str(btnLoctation['y']) +");")
Test Code:
Check if this code is working.
url = "https://stackoverflow.com/questions/55228646/python-selenium-cant-sometimes-scroll-element-into-view/55228932? noredirect=1#comment97192621_55228932"
driver.get(url)
element = driver.find_element_by_xpath("//a[.='Contact Us']")
element.location_once_scrolled_into_view
time.sleep(1)
driver.find_element_by_xpath("//p[.='active']").location_once_scrolled_into_view
driver.execute_script("arguments[0].scrollIntoView();",element)
time.sleep(1)
Related
This is my first time with selenium and the website I'm scraping (page) doesn't have a next page button and the pages for pagination don't change till you click the "..." and then it shows the next set of 10 pagination links. How do I loop through the clicking.
I've seen a few answers online but I don't couldn't adapt them to my code because of the links only come in sets. This is the code
from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
driver_path = 'Projects\Selenium Driver\chromedriver_win32'
driver = Chrome(executable_path=driver_path)
driver.get('https://business.nh.gov/nsor/search.aspx')
drop_down = driver.find_element(By.ID, 'ctl00_cphMain_lstStates')
select = Select(drop_down)
select.select_by_visible_text('NEW HAMPSHIRE')
driver.find_element(By.ID, 'ctl00_cphMain_btnSubmit').click()
content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
hrefs = []
for link_el in content:
href = link_el.get_attribute('href')
hrefs.append(href)
offenders_href = hrefs[:10]
pagination_links = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender tbody tr td table tbody a')
With your current code, the next page elements are already captured within list content[10:]. And the last page hyperlink with ellipsis is actually the next logical sequence. Using this fact, we can use a current page variable to keep track of the page being visited and use that to identify the right anchor tag element within list content for the next page.
With a do-while loop logic and using your code to scrape the required elements, here the primary code:
offenders_href = list()
curr_page = 1
while True:
# find all anchor tags with this table
content = driver.find_elements(By.CSS_SELECTOR, 'table#ctl00_cphMain_gvwOffender a')
hrefs = []
for link_el in content:
href = link_el.get_attribute('href')
hrefs.append(href)
offenders_href += hrefs[:10]
curr_page += 1
# find next page element
for page_elem in content[10:]:
if page_elem.get_attribute("href").endswith('$'+str(curr_page)+"')"):
next_page = page_elem
break
else:
# last page reached, break out of while
break
print(f'clicking {next_page.text}...')
next_page.click()
sleep(1)
I placed this code in function launch_click_pages. Launching it with your URL, it is a able to scroll through pages (it kept going, but I stopped it at some page):
>>> launch_click_pages('https://business.nh.gov/nsor/search.aspx')
clicking 2...
clicking 3...
clicking 4...
clicking 5...
clicking 6...
clicking 7...
clicking 8...
clicking 9...
clicking 10...
clicking ......
clicking 12...
clicking 13...
clicking 14...
clicking 15...
^C
You can try to execute script e.g. driver.execute_script("javascript:__doPostBack('ctl00$cphMain$gvwOffender','Page$5')") and you will redirected to fifth page
I am trying to loop through the pages and print the values for the table but it can't click the next button.
Error:
Selenium.common.exceptions.ElementClickInterceptedException: Message:
Element <a class="page-link"
href="javascript:move('generic-tokenholders2?a=0xB8c77482e45F1F44dE1
745F52C74426C631bDD52&sid=&m=normal&s=16579517055253348798759097&p=2')">
is not clickable at point (1148,2553) because another element <div id="overlay">
obscures it`
Page: https://etherscan.io/token/0xB8c77482e45F1F44dE1745F52C74426C631bDD52#balances
My Code:
driver.get("https://etherscan.io/token/0xB8c77482e45F1F44dE1745F52C74426C631bDD52#balances")
wait = WebDriverWait(driver,30)
# num=driver.find_element(By.CSS_SELECTOR, "/html/body/div[2]/div[3]/div/div/ul/li[3]/span/strong[2]").getText()
for i in range(1,20):
time.sleep(5)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID,"tokeholdersiframe")))
print("udj")
simpleTable = driver.find_element(By.XPATH,"/html/body/div[2]/div[3]/table")
rows = driver.find_elements(By.TAG_NAME,"tr")
for i in range(1,len(rows)):
cols = rows[i].find_elements(By.TAG_NAME,"td")
for g in cols:
print(g.text)
next = wait.until(EC.element_to_be_clickable((By.XPATH,"//div[#class='d-inline-block']//a[#aria-label='Next']")))
driver.execute_script("arguments[0].scrollIntoView(true);",next)
driver.execute_script("window.scrollBy(0,-200);")
next.click()
driver.switch_to.default_content()
Update:
Error pmadhu:
`
If you observe in the loop you are switching to a iframe performing some actions and clicking on Next.
When the 2nd page opened the scope is still on the previous page iframe and you are trying to find an iframe within that. Which is not the correct workflow.
You need to perform switch_to.default_content() after clicking on Next and then try to do the same on each page.
Below code did click on the Next without any exception:
driver.get("https://etherscan.io/token/0xB8c77482e45F1F44dE1745F52C74426C631bDD52#balances")
wait = WebDriverWait(driver,30)
for i in range(5):
time.sleep(5)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID,"tokeholdersiframe")))
next = wait.until(EC.element_to_be_clickable((By.XPATH,"//div[#class='d-inline-block']//a[#aria-label='Next']")))
driver.execute_script("arguments[0].scrollIntoView(true);",next)
driver.execute_script("window.scrollBy(0,-200);")
next.click()
driver.switch_to.default_content()
I am trying to scrape information from a popup box that appears after I click a button. The yellow "Contact" button that opens a new window revealing a hidden phone number that needs to be scraped. I tried clicking on that button and then scraping the text but I was unable to do so.
This is the link to the
image
This is the link to the website :
https://www.carlist.my/used-cars-for-sale/mazda/malaysia#1004411695
The error that I was originally getting after getting the very first phone number was
selenium.common.exceptions.ElementClickInterceptedException:
This error was resolved by closing the popup window before trying to click the next "Contact" button.
Here is my revised code:
button = driver.find_elements_by_class_name('listing__ctr.btn.btn--large.btn--primary.one-whole.btn--large.js-contact-seller.js-show-top.js-contact-seller--jump-phone')
for j in button:
driver.execute_script("arguments[0].scrollIntoView(true);", j)
driver.execute_script("arguments[0].click();", j)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
time.sleep(5)
try:
dealer_name = pop_up.find_element_by_class_name('listing__seller-name.js-chat-profile-fullname.c-seller-name.u-text-5.u-margin-bottom-none').text
dealer.append(dealer_name)
print(dealer)
try:
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
dealer_phoneNo.append(phone)
print(dealer_phoneNo)
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
whole = driver.find_element_by_tag_name('html')
whole.send_keys(Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN)
except:
dealer.append('Null')
dealer_phoneNo.append('Null')
whole = driver.find_element_by_tag_name('html')
whole.send_keys(Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN)
except:
pass
This revised code now does scrape the phone number and the dealer name properly. I have 2 new issues:
It does it twice for each "Contact" button
and
After it scrolls down to scrape other buttons, one of the pop-up windows does not display the phone number. I thought is was an exception, but the "try..catch" is not catching it.
Currently I have no idea how should I move forward.
Thank You in advance.
As we discussed in comments, it was important to close the window beofre going on to the next thing.
The scrollintoview associated with click for things beyond the first was not working. At least in firefox, when I used jsExecute of a click, it did not mind that you could not see what it was clicking on. But it was a bit ugly to watch, so I scrolled manually with down arrows. Here is the result. Note some of the things have 2 phone numbers, which you might want to handle.
Sorry for the red herrings about window handles. It turned out the popup was not really separate.
for j in phone:
driver.execute_script("arguments[0].click();", j)
time.sleep(3)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
print(phone)
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
whole = driver.find_element_by_tag_name('html')
whole.send_keys(Keys.DOWN+Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN)
Update: As you noted, the above was clicking on each contact twice. That is a specific feature of that page, for which your locator matched each Contact button twice. So I went with it and alternated. Meanwhile, another issue I had was that there was a contact without a phone number, which was causing an error (I see you added try catch to handle that) . That is why I added a check that there were any matches before trying to get text from it. Finally, I got rid of the down arrows, which were strange and imprecise and went with executing a scroll into view before executing click. This was scrolling them into view for me, and I see you did so too.
So here is what I ended up with. You can adapt your above revised code to do the same alternating.
i = 0
for j in button:
i = i + 1
if (i % 2 == 0):
continue
driver.execute_script("arguments[0].scrollIntoView()", j)
driver.execute_script("arguments[0].click();", j)
time.sleep(5)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
phones = pop_up.find_elements_by_class_name('flexbox__item.one-whole.u-flex__fill')
if len(phones) > 0:
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
print(phone)
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
I think final update:
You were right that try..except would catch the error if phone number did not appear. You just needed to recover that by closing the popup window anyway. Here is what that approach could look like (I took out a few things that in my testing I had not declared). I also got rid of the down arrows and kept the alternating as explained above.
buttons = driver.find_elements_by_class_name('listing__ctr.btn.btn--large.btn--primary.one-whole.btn--large.js-contact-seller.js-show-top.js-contact-seller--jump-phone')
i = 0
for button in buttons:
i = i + 1
if (i % 2 == 0):
continue
driver.execute_script("arguments[0].scrollIntoView(true);", button)
driver.execute_script("arguments[0].click();", button)
time.sleep(3)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
#dealer name never missing, so simplified
dealer_name = pop_up.find_element_by_class_name('listing__seller-name.js-chat-profile-fullname.c-seller-name.u-text-5.u-margin-bottom-none').text
print(dealer_name)
try:
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
print(phone)
except:
print("missing phone")
finally:
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
I am trying to scrape a long list of books in 10 web pages. When the loop clicks on next > button for the first time the website displays a login overlay so selenium can not find the target elements.
I have tried all the possible solutions:
Use some chrome options.
Use try-except to click X button on the overlay. But it appears only one time (when clicking next > for the first time). The problem is that when I put this try-except block at the end of while True: loop, it became infinite as I use continue in except as I do not want to break the loop.
Add some popup blocker extensions to Chrome but they do not work when I run the code although I add the extension using options.add_argument('load-extension=' + ExtensionPath).
This is my code:
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('disable-avfoundation-overlays')
options.add_argument('disable-internal-flash')
options.add_argument('no-proxy-server')
options.add_argument("disable-notifications")
options.add_argument("disable-popup")
Extension = (r'C:\Users\DELL\AppData\Local\Google\Chrome\User Data\Profile 1\Extensions\ifnkdbpmgkdbfklnbfidaackdenlmhgh\1.1.9_0')
options.add_argument('load-extension=' + Extension)
options.add_argument('--disable-overlay-scrollbar')
driver = webdriver.Chrome(options=options)
driver.get('https://www.goodreads.com/list/show/32339._50_?page=')
wait = WebDriverWait(driver, 2)
review_dict = {'title':[], 'author':[],'rating':[]}
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
prod_containers = html_soup.find_all('table', class_ = 'tableList js-dataTooltip')
while True:
table = driver.find_element_by_xpath('//*[#id="all_votes"]/table')
for product in table.find_elements_by_xpath(".//tr"):
for td in product.find_elements_by_xpath('.//td[3]/a'):
title = td.text
review_dict['title'].append(title)
for td in product.find_elements_by_xpath('.//td[3]/span[2]'):
author = td.text
review_dict['author'].append(author)
for td in product.find_elements_by_xpath('.//td[3]/div[1]'):
rating = td.text[0:4]
review_dict['rating'].append(rating)
try:
close = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div/div/div[1]/button')))
close.click()
except NoSuchElementException:
continue
try:
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'next_page')))
element.click()
except TimeoutException:
break
df = pd.DataFrame.from_dict(review_dict)
df
Any help like if I can change the loop to for loop clicks next > button until the end rather than while loop or where should I put try-except block to close the overlay or if there is Chromeoption can disable overlay.
Thanks in advance
Thank you for sharing your code and the website that you are having trouble with. I was able to close the Login Modal by using xpath. I took this challenge and broke up the code using class objects. 1 object is for the selenium.webdriver.chrome.webdriver and the other object is for the page that you wanted to scrape the data against ( https://www.goodreads.com/list/show/32339 ). In the following methods, I used the Javascript return arguments[0].scrollIntoView(); method and was able to scroll to the last book that displayed on the page. After I did that, I was able to click the next button
def scroll_to_element(self, xpath : str):
element = self.chrome_driver.find_element(By.XPATH, xpath)
self.chrome_driver.execute_script("return arguments[0].scrollIntoView();", element)
def get_book_count(self):
return self.chrome_driver.find_elements(By.XPATH, "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr").__len__()
def click_next_page(self):
# Scroll to last record and click "next page"
xpath = "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr[{0}]".format(self.get_book_count())
self.scroll_to_element(xpath)
self.chrome_driver.find_element(By.XPATH, "//div[#id='all_votes']//div[#class='pagination']//a[#class='next_page']").click()
Once I clicked on the "Next" button, I saw the modal display. I was able to find the xpath for the modal and was able to close the modal.
def is_displayed(self, xpath: str, int = 5):
try:
webElement = DriverWait(self.chrome_driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
return True if webElement != None else False
except:
return False
def is_modal_displayed(self):
return self.is_displayed("//body[#class='modalOpened']")
def close_modal(self):
self.chrome_driver.find_element(By.XPATH, "//div[#class='modal__content']//div[#class='modal__close']").click()
if(self.is_modal_displayed()):
raise Exception("Modal Failed To Close")
I hope this helps you to solve your problem.
I'm new to selenium and webscraping and I'm trying to get information from the link: https://www.carmudi.com.ph/cars/civic/distance:50km/?sort=suggested
Here's a snippet of the code I'm using:
while max_pages > 0:
results.extend(extract_content(driver.page_source))
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
max_pages -= 1
When I try to print results, I always get (max_pages) of the same results from page 1. The "Next page" button is visible in the page and when I try to find elements of the same class, it only shows 1 element. When I try getting the element by the exact xpath and performing the click action on it, it doesn't work as well. I enclosed it in a try-except block but there were no errors. Why might this be?
You are making this more complicated than it needs to be. There's no point in using JS clicks here... just use the normal Selenium clicks.
while True:
# do stuff on the page
next = driver.find_element_by_css_selector("a[title='Next page']")
if next
next.click()
else
break
replace:
next_page = driver.find_element_by_xpath('//div[#class="next-page"]')
driver.execute_script('arguments[0].click();', next_page)
with:
driver.execute_script('next = document.querySelector(".next-page"); next.click();')
If you try next = document.querySelector(".next-page"); next.click(); in console you can see it works.