Getting next page data through selenium and request - python

I'm looking to click on the next page to get the listed product data: https://www.riteaid.com/shop/grocery
When I look at the HTML element, the Next button is actually not a button element, but a span tag. Looking at the developer tools, I noticed that there is a tag that dynamically changes when I click on the next button without moving to a different url.
<div data-pager="p:pg1" data-tp="89" data-cp="2">
How would one go about getting the next page data or send the request to get the next page data? I thought I'd need selenium since it is interactive, but I got the error "selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable".
Thanks!

You can invoke click onto the element.
driver.get("https://www.riteaid.com/shop/grocery")
wait = WebDriverWait(driver, 10)
elem = wait.until(EC.presence_of_element_located((By.XPATH, "//span[text()='Next']/parent::div")))
driver.execute_script("arguments[0].click();", elem)
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

I checked the site and you can navigate to next page like this:
next_page_xpath = "//div[#class='rfk_next']"
driver.find_element_by_xpath(next_page_xpath).click()
Do remember to do while and put above code in try/except, so that it breaks from while loop once there will be no next page.

Related

How to advance into pagination with Web Scraping if click button doesn't exist

So, i'm trying to navigate into a pharmacy web site with Selenium (Python). This web site provides a catalog of thousands medicines and health products.
Im trying to do an "horizontal" web scraping, extracting the links for every single product in every page of the catalog (at this moment i can do that).
The problem came when i'm trying to advance to the next page of the catalog, i don't have a click button and the URL doesn't change.
url: https://salcobrand.cl/t/medicamentos
The buttons the advance in the pagination is looks like this:
And the HTML have the next path:
I wonder if someone can help with the code in selenium or any other library.
Thanks!
If you want to click to any element, it have to clickable (it should be somewhere on screen of web browser). That can be achieved by scrolling down the page using little JavaScript and driver.execute_script.
Working code looks like this:
import time
from selenium import webdriver
driver = webdriver.Chrome("chromedriver")
driver.get("https://salcobrand.cl/t/medicamentos")
# Scroll to bottom, so button can be clicked
driver.execute_script("window.scrollTo(0, document.body.scrollHeight - 2000);")
time.sleep(1) # Wait for page to render
# Click on element
button = driver.find_element_by_xpath("//*[#id='content']/nav/ul/li[2]/a")
button.click()
wait=WebDriverWait(driver,0)
driver.get('https://salcobrand.cl/t/medicamentos')
while True:
try:
elem=wait.until(EC.element_to_be_clickable((By.XPATH,"//a[.='ยป']")))
driver.execute_script("arguments[0].scrollIntoView();",elem)
time.sleep(1)
driver.execute_script("arguments[0].click()", elem)
elem.click()
time.sleep(1)
except Exception as e:
print(str(e))
break
You can go through all 42 pages like so just by looking for that element and then clicking it.
Import:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
In order to go through all the available pages you can click next page number until pagination button with next page will be available.
I mean starting from i = 2 you can search for button with number 2 until button with number n will be not existing.
After each click on the pagination button you will have to scrape the next page content, scroll down to the pagination button, click it and wait until the new content is loaded.
Something like this:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path='chromedriver.exe')
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get('https://salcobrand.cl/t/medicamentos')
next_page = 2
while True:
#here scrape your data
next = driver.find_elements(By.XPATH,"//ul[#class='pagination pagination-sm']//a[#href='#' and text()=" + str(next_page) + "]")
if next:
actions.move_to_element(next[0]).perform()
time.sleep(0.5)
driver.find_element(By.XPATH,"//ul[#class='pagination pagination-sm']//a[#href='#' and text()=" + str(next_page) + "]").click()
else:
break
Indicating the new content is loaded can be done as following:
Keep some presented search result on the current page.
After clicking the next page button use Expected Condition of invisibility_of_element_located to wait for that element to disappear.

Cannot click a button because Selenium is unable to locate an element

I have tried to make my script click the Purchase/Buy Family button on the Spotify Checkout Page. No matter what class, CSS, XPath, ID, or whatever I put in, it's just saying it could not find the object.
This is the button. It's not in an iframe:
<div class="sc-fzXfOu cvoJMt">
<button id="checkout_submit" class="Button-oyfj48-0 kaOWUo sc-fzXfOv tSdMK">
Buy Premium Family
</button>
</div>
My code:
time.sleep(3)
buy = driver.find_element_by_xpath("/html/body/div[3]/div/div/div/div/div/div/div[3]/div/div/div[2]/form/div[2]/button").click()
I am able to click the button,by a different xpath
driver.findElement(By.xpath("//button[#id='checkout_submit']")).click();
Edit -
Your xpath also works for me only when i load the page initially and there is no change in the dom - /html/body/div[3]/div/div/div/div/div/div/div[3]/div/div/div[2]/form/div[2]/button
It doesn't work when some new event or error is displayed and the dom structure changes.
Why use such relative XPath when the element has distinguishable attributes?
The problem here is the form is not static you have to wait loading of all elements.
And page loads with pane to accept cookies who can mask elements you want to run action on.
The best way in that case is first accept cookies and then run all actions you need.
Try to adapt this code for your needs, that run with success on my test.
In case you run this code you have to brake execution to login first, when the driver get the page.
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.remote.webelement import WebElement
# change this line
path_driver = "your_path_to_chrome_driver"
by, buy_selector, cookies_selector = By.CSS_SELECTOR, 'button#checkout_submit', "button#onetrust-accept-btn-handler"
driver = webdriver.Chrome(path_driver)
driver.maximize_window()
actions = ActionChains(driver)
driver.get("https://www.spotify.com/us/purchase/offer/premium-family/?country=US")
# wait for loading buy button
sls = wait.until(EC.presence_of_all_elements_located((by, buy_selector)))
if sls:
# get accept cookies button element and click
cookies_accept = driver.find_element_by_css_selector(cookies_selector)
if isinstance(cookies_accept, WebElement):
cookies_accept.click()
# get buy button element, move to element and click
buy = driver.find_element_by_css_selector(buy_selector)
if isinstance(buy, WebElement) and buy.is_displayed() and buy.is_enabled():
actions.move_to_element(buy).click(buy).perform()

How to click a button on a website by finding its id

When I run the code, the website loads up fine but then it won't click on the button- an error appears saying the element is not interacterble. What do I need to do to click the button? I am relatively new to this and would be grateful for any help.
I have already tried finding it by id and tag.
page = driver.get("https://kenpreston.co.uk/author/")
element = driver.find_element_by_id('mk-button-31')
element.click()
SOLVED:
I used driver.find_element_by_link_text and this worked fine.
I have checked the website and noticed that mk-button-31 is an id for a div tag and inside it there is an a tag. Try getting the url from the a tag and do another driver.get instead of clicking on it.
Also the whole div tag is not clickable so that is why you are getting this error.
Use sleep from time library to be sure page fully loaded
from time import sleep
page = driver.get("https://kenpreston.co.uk/author/")
sleep(2)
element = driver.find_element_by_id('mk-button-31')
element.click()
Looks like your element is not clickable you need to replace this id selector with the css and need to wait for the element before click on it.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
page = driver.get("https://kenpreston.co.uk/author/")
element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#mk-button-31 span"))
element.click();
Consider adding Explicit Wait to your script as it might be the case the DOM had finished loading and the button you're looking for is still not there.
The classes you're looking for are:
WebDriverWait
expected_conditions
Suggested code change:
#your other imports here
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
#your other code here
page = driver.get("https://kenpreston.co.uk/author/")
element = WebDriverWait(driver, 10).until(expected_conditions.element_to_be_clickable((By.ID, "mk-button-31")))
element.click()
More information: How to use Selenium to test web applications using AJAX technology

cannot locate element within a web page using selenium python

I just want to write a simple log in code for one website. However, I think the log in page was written in JS. It's really hard to locate the elements with selenium.
The web page I am going to play with is:
"https://www.nike.com/snkrs/login?returnUrl=%2F"
This is how the page looks like and how the inspect element page look like:
I was trying to locate the element by following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
driver.get("https://www.nike.com/snkrs/login?returnUrl=%2Fthread%2Fe9680e08e7e3cd76b8832684037a58a369cad5ed")
time.sleep(5)
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
elem =driver.find_element_by_xpath("//*[#id='ce3feab5-6156-441a-970e-23544473a623']")
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
driver.close()
This code return the error that the element could not find by [#id='ce3feab5-6156-441a-970e-23544473a623'.
I tried playing with frames, it seems does not work. If I went to "view web source" page, it is full of JS code.
Is there a good way to play with such a web page with selenium?
Try changing the code :
elem =driver.find_element_by_xpath("//*[#id='ce3feab5-6156-441a-970e-23544473a623']")
to
elem =driver.find_element_by_xpath("//*[#type='email']")
My guess (and observation) is that the id changes each time you visit the page. The id looks auto-generated, and when I go to the page multiple times, the id is different each time.
You'll need to search for something that doesn't change. For example, you can search for the name attribute, which has the seemingly static value "emailAddress"
element = driver.find_element_by_name("emailAddress")
You could also use an xpath expression to search for other attributes, such as data-componentname:
element = driver.find_element_by_xpath("//input[#data-componentname='emailAddress']")
Also, instead of a hard-coded sleep, you can simply wait for the element to be visible:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.nike.com/snkrs/login")
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.NAME, "emailAddress"))
)

Use selenium to click and view more text

I'm very new to Selenium.
I'm crawling data from this page. I need to scroll down the page and click on "Load More Arguments" to get more text. This is the location to click on.
<a class="debate-more-btn" href="javascript:void(0);" onclick="loadMoreArguments('15F7E61D-89B8-443A-A21C-13FD5EAA6087');">
Load More Arguments
</a>
I have tried this code but it does not work. Should I need more code to locate to that (I think the 1 has already tell the location to click). Do you have any recommendation? Thank you in advance.
[1] btn_moreDebate = driver.find_elements_by_class_name("debate-more-btn")
[2] btn.click()
Find the link by link text, move to the element and click:
from selenium.webdriver.common.action_chains import ActionChains
link = driver.find_element_by_link_text('Load More Arguments')
ActionChains(browser).move_to_element(link).perform()
link.click()
If you get an exception while finding an element, you may need to use an Explicit Wait:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Load More Arguments")))
ActionChains(browser).move_to_element(link).perform()
link.click()
If I understand your code correctly, I can see a few things wrong.
1. You're using find_elements_by_class_name. I'd recommend using find_element_by_class_name instead. elements returns a list, which isn't needed in a case where there is only one element.
2. You're using btn_moreDebate as the holder for the results of your find_elements, but then interacting with btn.
You should be able to perform the find and click in one action:
driver.find_element_by_class_name("debate-more-btn").click()

Categories