How to scrape a new popup using selenium? - python

I am trying to scrape information from a popup box that appears after I click a button. The yellow "Contact" button that opens a new window revealing a hidden phone number that needs to be scraped. I tried clicking on that button and then scraping the text but I was unable to do so.
This is the link to the
image
This is the link to the website :
https://www.carlist.my/used-cars-for-sale/mazda/malaysia#1004411695
The error that I was originally getting after getting the very first phone number was
selenium.common.exceptions.ElementClickInterceptedException:
This error was resolved by closing the popup window before trying to click the next "Contact" button.
Here is my revised code:
button = driver.find_elements_by_class_name('listing__ctr.btn.btn--large.btn--primary.one-whole.btn--large.js-contact-seller.js-show-top.js-contact-seller--jump-phone')
for j in button:
driver.execute_script("arguments[0].scrollIntoView(true);", j)
driver.execute_script("arguments[0].click();", j)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
time.sleep(5)
try:
dealer_name = pop_up.find_element_by_class_name('listing__seller-name.js-chat-profile-fullname.c-seller-name.u-text-5.u-margin-bottom-none').text
dealer.append(dealer_name)
print(dealer)
try:
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
dealer_phoneNo.append(phone)
print(dealer_phoneNo)
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
whole = driver.find_element_by_tag_name('html')
whole.send_keys(Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN)
except:
dealer.append('Null')
dealer_phoneNo.append('Null')
whole = driver.find_element_by_tag_name('html')
whole.send_keys(Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN)
except:
pass
This revised code now does scrape the phone number and the dealer name properly. I have 2 new issues:
It does it twice for each "Contact" button
and
After it scrolls down to scrape other buttons, one of the pop-up windows does not display the phone number. I thought is was an exception, but the "try..catch" is not catching it.
Currently I have no idea how should I move forward.
Thank You in advance.

As we discussed in comments, it was important to close the window beofre going on to the next thing.
The scrollintoview associated with click for things beyond the first was not working. At least in firefox, when I used jsExecute of a click, it did not mind that you could not see what it was clicking on. But it was a bit ugly to watch, so I scrolled manually with down arrows. Here is the result. Note some of the things have 2 phone numbers, which you might want to handle.
Sorry for the red herrings about window handles. It turned out the popup was not really separate.
for j in phone:
driver.execute_script("arguments[0].click();", j)
time.sleep(3)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
print(phone)
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
whole = driver.find_element_by_tag_name('html')
whole.send_keys(Keys.DOWN+Keys.DOWN + Keys.DOWN + Keys.DOWN + Keys.DOWN)
Update: As you noted, the above was clicking on each contact twice. That is a specific feature of that page, for which your locator matched each Contact button twice. So I went with it and alternated. Meanwhile, another issue I had was that there was a contact without a phone number, which was causing an error (I see you added try catch to handle that) . That is why I added a check that there were any matches before trying to get text from it. Finally, I got rid of the down arrows, which were strange and imprecise and went with executing a scroll into view before executing click. This was scrolling them into view for me, and I see you did so too.
So here is what I ended up with. You can adapt your above revised code to do the same alternating.
i = 0
for j in button:
i = i + 1
if (i % 2 == 0):
continue
driver.execute_script("arguments[0].scrollIntoView()", j)
driver.execute_script("arguments[0].click();", j)
time.sleep(5)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
phones = pop_up.find_elements_by_class_name('flexbox__item.one-whole.u-flex__fill')
if len(phones) > 0:
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
print(phone)
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)
I think final update:
You were right that try..except would catch the error if phone number did not appear. You just needed to recover that by closing the popup window anyway. Here is what that approach could look like (I took out a few things that in my testing I had not declared). I also got rid of the down arrows and kept the alternating as explained above.
buttons = driver.find_elements_by_class_name('listing__ctr.btn.btn--large.btn--primary.one-whole.btn--large.js-contact-seller.js-show-top.js-contact-seller--jump-phone')
i = 0
for button in buttons:
i = i + 1
if (i % 2 == 0):
continue
driver.execute_script("arguments[0].scrollIntoView(true);", button)
driver.execute_script("arguments[0].click();", button)
time.sleep(3)
pop_up = driver.find_element_by_class_name('modal.js-modal-dealer.modal--dealer.modal--dealer-ctr')
#dealer name never missing, so simplified
dealer_name = pop_up.find_element_by_class_name('listing__seller-name.js-chat-profile-fullname.c-seller-name.u-text-5.u-margin-bottom-none').text
print(dealer_name)
try:
phone = pop_up.find_element_by_class_name('flexbox__item.one-whole.u-flex__fill').text
print(phone)
except:
print("missing phone")
finally:
driver.find_element_by_class_name('flexbox__item.modal__destroy.b-close.weight--light.js-modal-destroy').click()
time.sleep(3)

Related

Selenium cannot locate element

I am using selenium to create a kahoot bot flooder. (kahoot.it) I am trying to use selenium to locate the input box, as well as the confirm button. Whenever I try to define them as a variable, I get this. "Command raised an exception: TimeoutException: Message:", which I think means that the 5 seconds that I set has expired, meaning that the element was never located.
for idr in tabs:
num+=1
drv.switch_to.window(idr)
time.sleep(0.3)
gameid = WebDriverWait(drv,5).until(EC.presence_of_element_located((By.CLASS_NAME , "sc-bZSQDF bXdUBZ")))
gamebutton = WebDriverWait(drv,5).until(EC.presence_of_element_located((By.CLASS_NAME , "sc-iqHYGH eMQRbB sc-geEHAE kTTBHH")))
gameid.send_keys(gamepin)
gamebutton.click()
time.sleep(0.8)
try:
nick = WebDriverWait(drv,5).until(EC.presence_of_element_located((By.CLASS_NAME , "sc-bZSQDF bXdUBZ")))
nickbutton = WebDriverWait(drv,5).until(EC.presence_of_element_located((By.CLASS_NAME , "sc-iqHYGH eMQRbB sc-ja-dpGc gYusMa")))
nick.send_keys(f'{name}{num - 1}')
nickbutton.click()
except:
I tried locating an "Iframe" which wasn't really successful (might have done it wrong), but I have been searching for hours and haven't found any answers. Any help would be appreciated.
The Class name for the input and button tags have spaces in it.
For input tag you can use the name attribute. and for button tag you can use the tag name since its the only button tag in the DOM.
gameinput = wait.until(EC.presence_of_element_located((By.NAME,"gameId")))
gameinput.send_keys("Sample Text")
submit = wait.until(EC.presence_of_element_located((By.TAG_NAME,"button")))
submit.click()
#It also worked with below line:
gameinput = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".sc-bZSQDF.bXdUBZ")))

Blocking login overlay window when scraping web page using Selenium

I am trying to scrape a long list of books in 10 web pages. When the loop clicks on next > button for the first time the website displays a login overlay so selenium can not find the target elements.
I have tried all the possible solutions:
Use some chrome options.
Use try-except to click X button on the overlay. But it appears only one time (when clicking next > for the first time). The problem is that when I put this try-except block at the end of while True: loop, it became infinite as I use continue in except as I do not want to break the loop.
Add some popup blocker extensions to Chrome but they do not work when I run the code although I add the extension using options.add_argument('load-extension=' + ExtensionPath).
This is my code:
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('disable-avfoundation-overlays')
options.add_argument('disable-internal-flash')
options.add_argument('no-proxy-server')
options.add_argument("disable-notifications")
options.add_argument("disable-popup")
Extension = (r'C:\Users\DELL\AppData\Local\Google\Chrome\User Data\Profile 1\Extensions\ifnkdbpmgkdbfklnbfidaackdenlmhgh\1.1.9_0')
options.add_argument('load-extension=' + Extension)
options.add_argument('--disable-overlay-scrollbar')
driver = webdriver.Chrome(options=options)
driver.get('https://www.goodreads.com/list/show/32339._50_?page=')
wait = WebDriverWait(driver, 2)
review_dict = {'title':[], 'author':[],'rating':[]}
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
prod_containers = html_soup.find_all('table', class_ = 'tableList js-dataTooltip')
while True:
table = driver.find_element_by_xpath('//*[#id="all_votes"]/table')
for product in table.find_elements_by_xpath(".//tr"):
for td in product.find_elements_by_xpath('.//td[3]/a'):
title = td.text
review_dict['title'].append(title)
for td in product.find_elements_by_xpath('.//td[3]/span[2]'):
author = td.text
review_dict['author'].append(author)
for td in product.find_elements_by_xpath('.//td[3]/div[1]'):
rating = td.text[0:4]
review_dict['rating'].append(rating)
try:
close = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div/div/div[1]/button')))
close.click()
except NoSuchElementException:
continue
try:
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'next_page')))
element.click()
except TimeoutException:
break
df = pd.DataFrame.from_dict(review_dict)
df
Any help like if I can change the loop to for loop clicks next > button until the end rather than while loop or where should I put try-except block to close the overlay or if there is Chromeoption can disable overlay.
Thanks in advance
Thank you for sharing your code and the website that you are having trouble with. I was able to close the Login Modal by using xpath. I took this challenge and broke up the code using class objects. 1 object is for the selenium.webdriver.chrome.webdriver and the other object is for the page that you wanted to scrape the data against ( https://www.goodreads.com/list/show/32339 ). In the following methods, I used the Javascript return arguments[0].scrollIntoView(); method and was able to scroll to the last book that displayed on the page. After I did that, I was able to click the next button
def scroll_to_element(self, xpath : str):
element = self.chrome_driver.find_element(By.XPATH, xpath)
self.chrome_driver.execute_script("return arguments[0].scrollIntoView();", element)
def get_book_count(self):
return self.chrome_driver.find_elements(By.XPATH, "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr").__len__()
def click_next_page(self):
# Scroll to last record and click "next page"
xpath = "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr[{0}]".format(self.get_book_count())
self.scroll_to_element(xpath)
self.chrome_driver.find_element(By.XPATH, "//div[#id='all_votes']//div[#class='pagination']//a[#class='next_page']").click()
Once I clicked on the "Next" button, I saw the modal display. I was able to find the xpath for the modal and was able to close the modal.
def is_displayed(self, xpath: str, int = 5):
try:
webElement = DriverWait(self.chrome_driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
return True if webElement != None else False
except:
return False
def is_modal_displayed(self):
return self.is_displayed("//body[#class='modalOpened']")
def close_modal(self):
self.chrome_driver.find_element(By.XPATH, "//div[#class='modal__content']//div[#class='modal__close']").click()
if(self.is_modal_displayed()):
raise Exception("Modal Failed To Close")
I hope this helps you to solve your problem.

Elem could not be scrolled into view

new to python and selenium.
For fun, I'm scraping a page. I have to click a first button for Comment, and then another button for All comments so I can get them all.
The first click works, but not the second.
I've set a hardcoded scroll, but still not working.
This is the python code I'm working on:
boton = driver.find_element_by_id('tabComments_btn')
boton.click()
wait = WebDriverWait(driver, 100)
from here on, it doesnt work (it scrolls but it says 'elem cant be scrolled into view'
driver.execute_script("window.scrollTo(0, 1300)")
botonTodos= driver.find_element_by_class_name('thread-node-children-load-all-btn')
wait = WebDriverWait(driver, 100)
botonTodos.click()
If I only click the first button, I'm able to scrape the first 10 comments, so this is working.
wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'thread-node-message')))
for elm in driver.find_elements_by_css_selector(".thread-node-message"):
print(elm.text)
This is the part of the HTML I'm stuck in:
Load next 10 comments
Load all comments
Publicar un comentario
There's a whitespace node with the tag #text between each .
Any ideas welcome.
Thanks.
Here are the different options.
#first create the elements ref
load_next_btn = driver.find_element_by_css_selector(".thread-node-children-load-next-btn")
load_all_btn = driver.find_element_by_css_selector(".thread-node-children-load-all-btn")
# scroll to button you are interested (I am scrolling to load_all_btn
# Option 1
load_all_btn.location_once_scrolled_into_view
# Option 2
driver.execute_script("arguments[0].scrollIntoView();",load_all_btn)
# Option 3
btnLoctation = load_all_btn.location
driver.execute_script("window.scrollTo(" + str(btnLoctation['x']) + "," + str(btnLoctation['y']) +");")
Test Code:
Check if this code is working.
url = "https://stackoverflow.com/questions/55228646/python-selenium-cant-sometimes-scroll-element-into-view/55228932? noredirect=1#comment97192621_55228932"
driver.get(url)
element = driver.find_element_by_xpath("//a[.='Contact Us']")
element.location_once_scrolled_into_view
time.sleep(1)
driver.find_element_by_xpath("//p[.='active']").location_once_scrolled_into_view
driver.execute_script("arguments[0].scrollIntoView();",element)
time.sleep(1)

Selenium clicking to next page until on last page

I am trying to keep clicking to next page on this website, each time appending the table data to a csv file and then when I reach the last page, append the table data and break the while loop
Unfortunately, for some reason it keeps staying on the last page, and I have tried several different methods to catch the error
while True:
try :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
except :
print("No more pages left")
break
driver.quit()
I also tried this one:
try:
link = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.pagination-next a')))
driver.execute_script('arguments[0].scrollIntoView();', link)
link.click()
except:
keep_going = False
I've tried putting print statements in and it just keeps staying on the last page.
Here is the HTML for the first page/last page for the next button, I'm not sure if I could do something utilizing this:
HTML for first page:
<li role="menuitem" ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next" style="">Next</li>
Next
</li>
HTML for last page:
<li role="menuitem" ng-if="::directionLinks" ng-class="{disabled: noNext()||ngDisabled}" class="pagination-next disabled" style="">Next</li>
Next
</li>
You can solve the problem as below,
Next Button Will be enabled until the Last Page and it will be disabled in the Last Page.
So, you can create two list to find the enabled button element and disabled button element. At any point of time,either enabled element list or disabled element list size will be one.So, If the element is disabled, then you can break the while loop else click on the next button.
I am not familiar with python syntax.So,You can convert the below java code and then use it.It will work for sure.
Code:
boolean hasNextPage=true;
while(hasNextPage){
List<WebElement> enabled_next_page_btn=driver.findElements(By.xpath("//li[#class='pagination-next']/a"));
List<WebElement> disabled_next_page_btn=driver.findElements(By.xpath("//li[#class='pagination-next disabled']/a"));
//If the Next button is enabled/available, then enabled_next_page_btn size will be one.
// So,you can perform the click action and then do the action
if(enabled_next_page_btn.size()>0){
enabled_next_page_btn.get(0).click();
hasNextPage=true;
}else if(disabled_next_page_btn.size()>0){
System.out.println("No more Pages Available");
break;
}
}
The next_page_btn.index(0).click() wasn't working, but checking the len of next_page_btn worked to find if it was the last page, so I was able to do this.
while True:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
if len(next_page_btn) < 1:
print("No more pages left")
break
else:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, 'Next'))).click()
Thanks so much for the help!
How about using a do/while loop and just check for the class "disabled" to be included in the attributes of the next button to exit out? (Excuse the syntax. I just threw this together and haven't tried it)
string classAttribute
try :
do
{
IWebElement element = driver.findElement(By.LINK_TEXT("Next"))
classAttribute = element.GetAttribute("class")
element.click()
}
while(!classAttribute.contains("disabled"))
except :
pass
driver.quit()
xPath to the button is:
//li[#class = 'pagination-next']/a
so every time you need to load next page you can click on this element:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
next_page_btn.index(0).click()
Note: you should add a logic:
while True:
next_page_btn = driver.find_elements_by_xpath("//li[#class = 'pagination-next']/a")
if len(next_page_btn) < 1:
print("No more pages left")
break
else:
# do stuff

python selenium firefox behavior

I am using a firefox browser with selenium. I am scraping a website that has multiple pages like google search, where you can pick the page at the bottom. On each page, I click an element, like google again, and scrap data from that element's information. If I am at an element's information on the third page, and click the back button using my regular firefox browser, it goes back to the third page. But, when I press the back button in selenium with driver.back(), it takes me back to the first page. Anyone know how to fix this?
count = 1
while 1:
try:
pages = driver.find_elements_by_css_selector("a.page-number.gradient")
except:
break
for page in pages:
if page.text==str(count):
page.click()
print count
break
states = driver.find_elements_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr/td[19]")
fails = []
i = 1
for state in states:
if state.text == "FAILED":
fails.append(i)
i+=1
for fail in fails:
print driver.find_element_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr[" + str(fail) + "]/td[19]").text
driver.find_element_by_xpath("//*[#id='table_div']/div/div/table/tbody/tr[" + str(fail) + "]/td[1]/input").click()
time.sleep(2)
errors = driver.find_element_by_name("errors")
if "\n" in errors.text:
fixedText = errors.text.split("\n")[0]
errors.clear()
errors.send_keys(fixedText)
time.sleep(1)
driver.find_element_by_name('post_type').click()
time.sleep(5)
driver.switch_to_alert().accept()
driver.switch_to_alert().accept()
driver.back()
driver.back()
else:
driver.back()
driver.switch_to_alert().accept()
driver.switch_to_alert().accept()
count+=1
The code is really complicated, but basically it's the driver.back() lines that aren't working

Categories