Hello Every one can any one help me in scrolling https://www.grainger.com/category/black-pipe-fittings/pipe-fittings/pipe-tubing-and-fittings/plumbing/ecatalog/N-qu1?searchRedirect=products
i want to scroll this using
actions = ActionChains(browser)
actions.send_keys(Keys.PAGE_DOWN)
actions.perform()
till it reaches the bottom of the scroll where it will find an element "Load More"
loadMoreButton = browser.find_element_by_css_selector(
".btn.list-view__load-more.list-view__load-more--js")
loadMoreButton.click()
and then ponce clicked the load more button it has to again perform the scroll action and then again the loadmore action until the load more button is not available.
I have to use this page down action as the element does not load until the page is scrolled till the element if anyone could suggest some solution will be of great help
This has worked for me with zero issues...
from selenium.webdriver.common.keys import Keys
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)
To scroll the page https://www.grainger.com/category/black-pipe-fittings/pipe-fittings/pipe-tubing-and-fittings/plumbing/ecatalog/N-qu1?searchRedirect=products till it reaches the bottom of the page where it will find an element with text as View More and then click the element until the element is not available you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import TimeoutException
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
browser=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
browser.get("https://www.grainger.com/category/black-pipe-fittings/pipe-fittings/pipe-tubing-and-fittings/plumbing/ecatalog/N-qu1?searchRedirect=products")
while True:
try:
browser.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='btn list-view__load-more list-view__load-more--js' and normalize-space()='View More']"))))
browser.execute_script("arguments[0].click();", WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='btn list-view__load-more list-view__load-more--js' and normalize-space()='View More']"))))
print("View More button clicked")
except (TimeoutException, StaleElementReferenceException) as e:
print("No more View More buttons")
break
browser.quit()
Console Output:
View More button clicked
View More button clicked
No more View More buttons
#PedroLobito I am trying to retrieve the product links can you help me
in this
No need for selenium in this case, just sniff the xhr requests via developer tools and go straight to the gold (json).
The url structure for products is as follows:
https://www.x.com/product/anything-Item#
Just add the Item # value in the json object at the end of the url, something like:
https://www.x.com/product/anything-5P540
https://www.x.com/product/anything-5P541
...
py3 example (for py2, just change the format syntax):
import json
import requests
main_cat = "WP7115916"
sub_cat = "4836"
x = requests.get(f"https://www.x.com/product/tableview/GRAINGER-APPROVED-Square-Head-Plugs-{main_cat}/_/N-qu1?searchRedirect=products&breadcrumbCatId={sub_cat}&s_pp=false").json()
for p in x['records']:
for childs in p['children']:
for item in json.loads(childs['collapseValues']):
url = f"https://www.x.com/product/lol-{item['sku']}"
print(url)
https://www.x.com/product/lol-5P540
https://www.x.com/product/lol-5P541
https://www.x.com/product/lol-5P542
https://www.x.com/product/lol-5P543
https://www.x.com/product/lol-5P544
https://www.x.com/product/lol-5P545
https://www.x.com/product/lol-5P546
https://www.x.com/product/lol-5P547
https://www.x.com/product/lol-5P548
...
One of the best method for smooth scrolling...
html = driver.find_element(By.XPATH,'//body')
total_scroled = 0
page_height = driver.execute_script("return document.body.scrollHeight")
while total_scroled < page_height:
html.send_keys(Keys.PAGE_DOWN)
total_scroled += 400
time.sleep(.5)
Related
So, i'm trying to navigate into a pharmacy web site with Selenium (Python). This web site provides a catalog of thousands medicines and health products.
Im trying to do an "horizontal" web scraping, extracting the links for every single product in every page of the catalog (at this moment i can do that).
The problem came when i'm trying to advance to the next page of the catalog, i don't have a click button and the URL doesn't change.
url: https://salcobrand.cl/t/medicamentos
The buttons the advance in the pagination is looks like this:
And the HTML have the next path:
I wonder if someone can help with the code in selenium or any other library.
Thanks!
If you want to click to any element, it have to clickable (it should be somewhere on screen of web browser). That can be achieved by scrolling down the page using little JavaScript and driver.execute_script.
Working code looks like this:
import time
from selenium import webdriver
driver = webdriver.Chrome("chromedriver")
driver.get("https://salcobrand.cl/t/medicamentos")
# Scroll to bottom, so button can be clicked
driver.execute_script("window.scrollTo(0, document.body.scrollHeight - 2000);")
time.sleep(1) # Wait for page to render
# Click on element
button = driver.find_element_by_xpath("//*[#id='content']/nav/ul/li[2]/a")
button.click()
wait=WebDriverWait(driver,0)
driver.get('https://salcobrand.cl/t/medicamentos')
while True:
try:
elem=wait.until(EC.element_to_be_clickable((By.XPATH,"//a[.='»']")))
driver.execute_script("arguments[0].scrollIntoView();",elem)
time.sleep(1)
driver.execute_script("arguments[0].click()", elem)
elem.click()
time.sleep(1)
except Exception as e:
print(str(e))
break
You can go through all 42 pages like so just by looking for that element and then clicking it.
Import:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
In order to go through all the available pages you can click next page number until pagination button with next page will be available.
I mean starting from i = 2 you can search for button with number 2 until button with number n will be not existing.
After each click on the pagination button you will have to scrape the next page content, scroll down to the pagination button, click it and wait until the new content is loaded.
Something like this:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
driver = webdriver.Chrome(executable_path='chromedriver.exe')
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)
driver.get('https://salcobrand.cl/t/medicamentos')
next_page = 2
while True:
#here scrape your data
next = driver.find_elements(By.XPATH,"//ul[#class='pagination pagination-sm']//a[#href='#' and text()=" + str(next_page) + "]")
if next:
actions.move_to_element(next[0]).perform()
time.sleep(0.5)
driver.find_element(By.XPATH,"//ul[#class='pagination pagination-sm']//a[#href='#' and text()=" + str(next_page) + "]").click()
else:
break
Indicating the new content is loaded can be done as following:
Keep some presented search result on the current page.
After clicking the next page button use Expected Condition of invisibility_of_element_located to wait for that element to disappear.
Link to the page I am trying to scrape:
https://www.nytimes.com/reviews/dining
Because this page has a "show more" button, I needed Selenium to automatically click the "show more" button iteratively, and then somehow use Beautiful soup to harvest the links to each individual restaurant review on the page. In the photo below, the link I want to harvest is within the https://...onigiri.html">.
Code so far:
url = "https://www.nytimes.com/reviews/dining"
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get(url)
for i in range(1):
button = driver.find_element_by_tag_name("button")
button.click()
How do I use WebDriverWait and BeautifulSoup [BeautifulSoup(driver.page_source, 'html.parser')] to complete this task?
Go to https://www.nytimes.com/reviews/dining press F12 and then press Ctrl+Shift+C to get element Show More, then as I showed in picture get your xpath of element:
In order to find xpath please look at:
https://www.techbeamers.com/locate-elements-selenium-python/#locate-element-by-xpath
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def executeTest():
global driver
driver.get('https://www.nytimes.com/reviews/dining')
time.sleep(7)
element = driver.find_element_by_xpath('Your_Xpath')
element.click()
time.sleep(3)
def startWebDriver():
global driver
options = Options()
options.add_argument("--disable-infobars")
driver = webdriver.Chrome(chrome_options=options)
if __name__ == "__main__":
startWebDriver()
executeTest()
driver.quit()
This is a lazy loading application.To click on the Show More button you need to use infinite loop and scroll down the page to look for and then click and wait for some time to load the page and then store the value in the list.Verify the list before and after if it matches then break from infinite loop.
Code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
driver=webdriver.Chrome()
driver.get("https://www.nytimes.com/reviews/dining")
#To accept the coockie click on that
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Accept']"))).click()
listhref=[]
while(True):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
elements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.css-gg4vpm")))
lenlistbefore=len(listhref)
for ele in elements:
if ele.get_attribute("href") in listhref:
continue
else:
listhref.append(ele.get_attribute("href"))
lenlistafter = len(listhref)
if lenlistbefore==lenlistafter:
break
button=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//button[text()='Show More']")))
driver.execute_script("arguments[0].click();", button)
time.sleep(2)
print(len(listhref))
print(listhref)
Note:- I am getting list count 499
As a beginner with python I am trying to make a simple automated login project. One more thing I have to do is to mouse click on the 4th row of html table to show me proper content. The html code of that segment is:
<tr class="tbl_seznam_barva_1" onclick="setTimeout('__doPostBack(\'ctl02$ctl00$BrowseSql1\',\'Select$0\')',470);" onmouseover="radekSeznamuClass=this.className;this.className='RowMouseOver';" onmouseout="this.className=radekSeznamuClass;">
<td>virtuálny terminál</td>
</tr>
How to execute this "onclick" event?
from selenium import webdriver
#...
browser = webdriver.Firefox()
elem = browser.find_element_by_name('txtUsername')
elem.send_keys('myLogin' + Keys.RETURN)
elem = browser.find_element_by_xpath("//tr[4]")
# some code for event execution goes here...
If you want to click() on the element with text as virtuálny terminál you can achieve it with:
browser.find_element_by_xpath("//*[text()='virtuálny terminál']").click()
If you need to click on more elements you can use a for-loop on all the elements.
elements = browser.find_element_by_xpath("//tr[4]")
for i in elements:
print(i.text)
Edit:
You can use ActionChains:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
my_elem = browser.find_element_by_xpath("//tr[4]")
action = ActionChains(browser)
action.move_to_element(my_elem)
# action.move_to_element_with_offset(my_elem, 5, 5)
action.click()
action.perform()
Edit2:
If you can't use chromedriver and you have nothing else to do you can use execute_script:
element = browser.find_element_by_xpath("//tr[4]")
browser.execute_script("arguments[0].click();", element)
The problem is that one should wait for webpage to fully load
After the line elem.send_keys('myLogin' + Keys.RETURN) the webpage needs time to render a content, so a delay should by added:
import time
# ...
elem.send_keys('myLogin' + Keys.RETURN)
time.sleep(1)
elem=browser.find_element_by_xpath("//tr[4]")
elem.click()
I've written a script in python to parse some names from a webpage. The items available in that webpage doesn't get displayed all at a time, rather, it is necessary to scroll to the bottom to let the webpage release few more items and again few more upon another scrolling and so on until all items are visible. The problem is the items are not located in the body that is why driver.execute_script("return document.body.scrollHeight;") this command is not working (IMO). It is located in the left sided area like a sliding container. How can I reach the bottom of that container and parse the names from this webpage? I've written almost all the codes except for controlling the lazy-load. I'm attaching an image to give you an idea what did i try to mean by calling it a sliding container.
The link to that webpage: Link
This what I've tried so far:
from selenium import webdriver; import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("replace_the_above_link")
check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
height = driver.execute_script("return document.body.scrollHeight;")
if height == check_height:
break
check_height = height
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".select_list h2 a"))):
print(item.text)
driver.quit()
This is the image of that box which contains item: Click Here
Currently my scraper is parsing items which are visible when the page is loaded.
Below code should allow you to make XHR requests by scrolling container as much time as possible and then scrape required data:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("https://www.weedsta.com/dispensaries/in/california")
entries_count = len(wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "select_list"))))
while True:
driver.find_element_by_class_name("tel").send_keys(Keys.END)
try:
wait.until(lambda driver: entries_count < len(driver.find_elements_by_class_name("select_list")))
except:
break
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".select_list h2 a"))):
print(item.text)
driver.quit()
I want to grab the page source of the page after I make a click. And then go back using browser.back() function. But Selenium doesn't let the page fully load after the click and the content which is generated by JavaScript isn't being included in the page source of that page.
element[i].click()
#Need to wait here until the content is fully generated by JS.
#And then grab the page source.
scoreCardHTML = browser.page_source
browser.back()
As Alan mentioned - you can wait for some element to be loaded. Below is an example code
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "element_id")))
you can also use seleniums staleness_of
from selenium.webdriver.support.expected_conditions import staleness_of
def wait_for_page_load(browser, timeout=30):
old_page = browser.find_element_by_tag_name('html')
yield
WebDriverWait(browser, timeout).until(
staleness_of(old_page)
)
You can do it using this method of a loop of try and wait, an easy to implement method
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
Button=''
while not Button:
try:
Button=browser.find_element_by_name('NAME OF ELEMENT')
Button.click()
except:continue
Assuming "pass" is an element in the current page and won't be present at the target page.
I mostly use Id of the link I am going to click on. because it is rarely present at the target page.
while True:
try:
browser.find_element_by_id("pass")
except:
break