Scroll down to end page using Selenium Python chromeDriver

Scroll down to end page using Selenium Python chromeDriver - python

Please help, I want to scroll down to end of the bage but it stops.
the code that i try is here
browser = webdriver.Chrome()
browser.get(url)
button = browser.find_element_by_tag_name("html")
old=""
new=" "
while len(new)>len(old):
old = browser.page_source
button.send_keys(Keys.END)
browser.implicitly_wait(40)
new = browser.page_source

your script for that
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
you can set no. of scrolls needed to get full length
scrolls = 4
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break

Related

Once reached last page, Python Selenium Script to continue instead of wait indefinitely

I am trying to click all next pages until last page which it does successfully on this website. However, it reaches the last page and then waits indefinitely. How can I best achieve this script to then proceed to the rest of the script once it reaches the last page? I could do an explicit wait time of 15 seconds timeout but this feels very slow and not the best way of doing this. Thanks
Full code
i = 1
while i < 6:
try:
time.sleep(2)
#time.sleep(random, 3)
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".name:nth-child(1)")))
WebDriverWait(driver, 100).until(lambda driver: driver.execute_script('return document.readyState') == 'complete')
element = WebDriverWait(driver, 20).until(lambda driver: driver.find_element(By.CSS_SELECTOR, ".name:nth-child(1) , bf-coupon-table:nth-child(1) tr:nth-child(1) .matched-amount-value"))
scroll = driver.find_element(By.CSS_SELECTOR, ".coupon-page-navigation__label--next")
driver.execute_script("arguments[0].scrollIntoView();", scroll)
link = driver.find_element_by_css_selector('[href^=http://somelink.com/]')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "bf-coupon-page-navigation > ul > li:nth-child(4) > a")))
NextStory = WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'bf-coupon-page-navigation > ul > li:nth-child(4) > a')))
link = driver.find_element_by_css_selector('bf-coupon-page-navigation > ul > li:nth-child(4) > a')
NextStory.click()
except:
i = 6

You are not getting to the except block, otherwise the loop would stop.
I think that this line is causing the slowness:
driver.execute_script("arguments[0].scrollIntoView();", scroll)
That is bacause all your other actions are driver actions with timeouts. Try clicking on the webpage and then just click page down key in a loop:
from selenium.webdriver.common.keys import Keys
# Click on a static element in your page
for _ in range(100):
driver.send_keys(Keys.PAGE_DOWN)

Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

So I came from the question here
Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page.
The problem is when I count the items, the code only returns 20 and it should be 40.
I have checked the code again and again - I'm missing something but I don't know what.
See my code below:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import datetime
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
#options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=r"C:\\chromedriver.exe", options=options)
url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'
driver.get(url)
iter=1
while True:
scrollHeight = driver.execute_script("return document.documentElement.scrollHeight")
Height=10*iter
driver.execute_script("window.scrollTo(0, " + str(Height) + ");")
if Height > scrollHeight:
print('End of page')
break
iter+=1
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()
time.sleep(3)
ver_mas = driver.find_elements_by_class_name('button-load-more')
for x in range(len(ver_mas)):
if ver_mas[x].is_displayed():
driver.execute_script("arguments[0].click();", ver_mas[x])
time.sleep(10)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
# print(soup)
items = soup.find_all('div',class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))
````=
What is wrong?. I newbie in the scraping world.
Regards

Your while and for statements don't work as intended.
Using while True: is a bad practice
You scroll until the bottom - but the button-load-more button isn't displayed there - and Selenium will not find it as displayed
find_elements_by_class_name - looks for multiple elements - the page has only one element with that class
if ver_mas[x].is_displayed(): if you are lucky this will be executed only once because the range is 1
Below you can find the solution - here the code looks for the button, moves to it instead of scrolling, and performs a click. If the code fails to found the button - meaning that all the items were loaded - it breaks the while and moves forward.
url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'
driver.get(url)
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()
iter = 1
while iter > 0:
time.sleep(3)
try:
ver_mas = driver.find_element_by_class_name('button-load-more')
actions = ActionChains(driver)
actions.move_to_element(ver_mas).perform()
driver.execute_script("arguments[0].click();", ver_mas)
except NoSuchElementException:
break
iter += 1
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
# print(soup)
items = soup.find_all('div', class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))

Python Selenium: Scrolling not working

I am trying to automate this Instagram link. I need to scroll and scroll and fetch all links. I am trying following but not working.
def fetch_links_by_hashtag(hash_tag):
url = 'https://www.instagram.com/explore/tags/marketing/'
driver.get(url)
driver.implicitly_wait(20)
is_more = False
try:
elem_more = wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "Load more")))
elem_more.click()
is_more = True
except Exception as ex:
print(str(ex))
pop = driver.find_element_by_tag_name('footer')
#pop = driver.find_element_by_link_text('About us')
# pop = driver.find_element_by_class_name('_4gt3b')
if pop is not None:
for i in range(10):
print('Calling scrolling script')
# It scolls till end
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', pop)
sleep(4)
html = pop.get_attribute('innerHTML')
print(html)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
How to scroll down to the bottom of a page ?

In addition to 宏杰李 answer
driver.execute_script("return arguments[0].scrollIntoView();", element_obj)
Also, if you want to make an extra scroll:
driver.execute_script("return arguments[0].parentNode.scrollTop = "
"arguments[0].parentNode.scrollTop + {extra_scroll}"
.format(extra_scroll=extra_scroll_pixels), element_obj)
My entire code:
def _scroll_to_element(driver, element,
extra_scroll=None):
# Scroll to element
driver.execute_script("return arguments[0].scrollIntoView();", element)
# Scroll parentNode with the extra pixels (If provided)
if extra_scroll:
driver.execute_script(
"return arguments[0].parentNode.scrollTop = "
"arguments[0].parentNode.scrollTop + {extra_scroll}".format(
extra_scroll=str(extra_scroll)), element)

Selenium Webdriver performance versus time

I'm having issues with selenium webdriver runtime. In fact I'm opening an array with 10 urls and scraping some content.
As the time goes and selenium open the forth url, it gets extremely slow... if I let the task continue, it can't be finished, python aborts the process because of exceeded run time.
Imagine, first url scrape takes 1 minute, the second one 1 - 2 minutes, third 4 minutes, ..., then it breaks.
I need some workaround for this issue, I'm using ipython notebook 2.7.
PS: Do you think opening the url in different tabs could help?
Edit: This is how I create browser:
chromeOptions = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images":2,
"profile.default_content_setting_values.notifications" : 2,}
chromeOptions.add_experimental_option("prefs",prefs)
chromeOptions.add_argument("--window-position=0,0")
browser = webdriver.Chrome(chrome_options=chromeOptions)
This is the task is being run in each url of the array:
browser.get(url)
lastHeight = browser.execute_script("return document.body.scrollHeight")
while True:
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
newHeight = browser.execute_script("return document.body.scrollHeight")
if newHeight == lastHeight:
break
lastHeight = newHeight
start = 'Por '
end = ' com'
html_source = browser.page_source
soup = BeautifulSoup(html_source)
cl = soup.find_all('div', attrs={'class': 'cl'})
names = [None] * len(cl)
for i in range(len(cl)):
try: names[i] = re.search('%s(.*)%s' % (start, end), cl[i].text).group(1)
except: continue
photosof = list(set(names))

Unfortunatelly Selenium performance is highly dependent of time, it decreases very fast. The only solution I found was to close and reopen the driver.

How can I scroll a web page using selenium webdriver in python?

I am currently using selenium webdriver to parse through facebook user friends page and extract all ids from the AJAX script. But I need to scroll down to get all the friends. How can I scroll down in Selenium. I am using python.

You can use
driver.execute_script("window.scrollTo(0, Y)")
where Y is the height (on a fullhd monitor it's 1080). (Thanks to #lukeis)
You can also use
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
to scroll to the bottom of the page.
If you want to scroll to a page with infinite loading, like social network ones, facebook etc. (thanks to #Cuong Tran)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
another method (thanks to Juanse) is, select an object and
label.sendKeys(Keys.PAGE_DOWN);

If you want to scroll down to bottom of infinite page (like linkedin.com), you can use this code:
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
Reference: https://stackoverflow.com/a/28928684/1316860

You can use send_keys to simulate an END (or PAGE_DOWN) key press (which normally scroll the page):
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
html = driver.find_element(By.TAG_NAME, 'html')
html.send_keys(Keys.END)

same method as shown here:
in python you can just use
driver.execute_script("window.scrollTo(0, Y)")
(Y is the vertical position you want to scroll to)

element=find_element_by_xpath("xpath of the li you are trying to access")
element.location_once_scrolled_into_view
this helped when I was trying to access a 'li' that was not visible.

For my purpose, I wanted to scroll down more, keeping the windows position in mind. My solution was similar and used window.scrollY
driver.execute_script("window.scrollTo(0, window.scrollY + 200)")
which will go to the current y scroll position + 200

This is how you scroll down the webpage:
driver.execute_script("window.scrollTo(0, 1000);")

None of these answers worked for me, at least not for scrolling down a facebook search result page, but I found after a lot of testing this solution:
while driver.find_element_by_tag_name('div'):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Divs=driver.find_element_by_tag_name('div').text
if 'End of Results' in Divs:
print 'end'
break
else:
continue

The easiest way i found to solve that problem was to select a label and then send:
label.sendKeys(Keys.PAGE_DOWN);
Hope it works!

When working with youtube the floating elements give the value "0" as the scroll height
so rather than using "return document.body.scrollHeight" try using this one "return document.documentElement.scrollHeight"
adjust the scroll pause time as per your internet speed
else it will run for only one time and then breaks after that.
SCROLL_PAUSE_TIME = 1
# Get scroll height
"""last_height = driver.execute_script("return document.body.scrollHeight")
this dowsnt work due to floating web elements on youtube
"""
last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.documentElement.scrollHeight")
if new_height == last_height:
print("break")
break
last_height = new_height

scroll loading pages. Example: medium, quora,etc
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);")
# Wait to load the page.
driver.implicitly_wait(30) # seconds
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
# sleep for 30s
driver.implicitly_wait(30) # seconds
driver.quit()

Here's an example selenium code snippet that you could use for this type of purpose. It goes to the url for youtube search results on 'Enumerate python tutorial' and scrolls down until it finds the video with the title: 'Enumerate python tutorial(2020).'
driver.get('https://www.youtube.com/results?search_query=enumerate+python')
target = driver.find_element_by_link_text('Enumerate python tutorial(2020).')
target.location_once_scrolled_into_view

This code scrolls to the bottom but doesn't require that you wait each time. It'll continually scroll, and then stop at the bottom (or timeout)
from selenium import webdriver
import time
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://example.com')
pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
run_time, max_run_time = 0, 1
while True:
iteration_start = time.time()
# Scroll webpage, the 100 allows for a more 'aggressive' scroll
driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);')
post_scroll_height = driver.execute_script('return document.body.scrollHeight;')
scrolled = post_scroll_height != pre_scroll_height
timed_out = run_time >= max_run_time
if scrolled:
run_time = 0
pre_scroll_height = post_scroll_height
elif not scrolled and not timed_out:
run_time += time.time() - iteration_start
elif not scrolled and timed_out:
break
# closing the driver is optional
driver.close()
This is much faster than waiting 0.5-3 seconds each time for a response, when that response could take 0.1 seconds

I was looking for a way of scrolling through a dynamic webpage, and automatically stopping once the end of the page is reached, and found this thread.
The post by #Cuong Tran, with one main modification, was the answer that I was looking for. I thought that others might find the modification helpful (it has a pronounced effect on how the code works), hence this post.
The modification is to move the statement that captures the last page height inside the loop (so that each check is comparing to the previous page height).
So, the code below:
Continuously scrolls down a dynamic webpage (.scrollTo()), only stopping when, for one iteration, the page height stays the same.
(There is another modification, where the break statement is inside another condition (in case the page 'sticks') which can be removed).
SCROLL_PAUSE_TIME = 0.5
while True:
# Get scroll height
### This is the difference. Moving this *inside* the loop
### means that it checks if scrollTo is still scrolling
last_height = driver.execute_script("return document.body.scrollHeight")
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
# try again (can be removed)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
# check if the page height has remained the same
if new_height == last_height:
# if so, you are done
break
# if not, move on to the next loop
else:
last_height = new_height
continue

You can use send_keys to simulate a PAGE_DOWN key press (which normally scroll the page):
from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)

if you want to scroll within a particular view/frame (WebElement), what you only need to do is to replace "body" with a particular element that you intend to scroll within. i get that element via "getElementById" in the example below:
self.driver.execute_script('window.scrollTo(0, document.getElementById("page-manager").scrollHeight);')
this is the case on YouTube, for example...

The ScrollTo() function doesn't work anymore. This is what I used and it worked fine.
driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

According to the docs,
the class ActionChains does the job:
from selenium import webdriver
from selenium.webdriver import ActionChains
driver = webdriver.Firefox()
action_chains = ActionChains(driver)
action_chains.scroll(x: int, y: int, delta_x: int, delta_y: int, duration: int = 0, origin: str = 'viewport').perform()

insert this line driver.execute_script("window.scrollBy(0,925)", "")

The loop using the "send keys" method of scrolling the page:
pre_scroll_height = driver.execute_script('return document.body.scrollHeight;')
while True:
driver.find_element_by_tag_name('body').send_keys(Keys.END)
time.sleep(5)
post_scroll_height = driver.execute_script('return document.body.scrollHeight;')
print(pre_scroll_height, post_scroll_height)
if pre_scroll_height == post_scroll_height:
break
pre_scroll_height=post_scroll_height

Here is a method I wrote to slowly scroll down to a targets element
You can pass either Y-th position of element of the CSS Selector to it
It scrolls exactly like we do via mouse-wheel
Once this method called, you call it again with same driver object but with new target element, it will then scroll up/down wherever that element exists
def slow_scroll_to_element(self, driver, element_selector=None, target_yth_location=None):
current_scroll_position = int(driver.execute_script("return window.scrollY"))
if element_selector:
target_yth_location = int(driver.execute_script("return document.querySelector('{}').getBoundingClientRect()['top'] + window.scrollY".format(element_selector)))
scrollSpeed = 100 if target_yth_location-current_scroll_position > 0 else -100
def chunks(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
for l in list(chunks(list(range(current_scroll_position, target_yth_location, scrollSpeed)) + list([target_yth_location+(-scrollSpeed if scrollSpeed > 0 else scrollSpeed)]), 3)):
for pos in l:
driver.execute_script("window.scrollTo(0, "+str(pos)+");")
time.sleep(0.1)
time.sleep(random.randint(1,3))

driver.execute_script("document.getElementById('your ID Element').scrollIntoView();")
it's working for my case.

Just a small variation of the solutions provided so far: sometimes in scraping you have to meet the following requirements:
Keep scrolling step by step. Otherwise if you always jump to the bottom some elements are loaded only as containers/divs but their content is not loaded because they were never visible (because you jumped straight to the bottom);
Allow enough time for content to be loaded;
It's not an infinite scroll page, there is an end and you have to identify when the end is reached;
Here is a simple implementation:
from time import sleep
def keep_scrolling_to_the_bottom():
while True:
previous_scrollY = my_web_driver.execute_script( 'return window.scrollY' )
my_web_driver.execute_script( 'window.scrollBy( 0, 230 )' )
sleep( 0.4 )
if previous_scrollY == my_web_driver.execute_script( 'return window.scrollY' ):
print( 'job done, reached the bottom!' )
break
Tested and working on Windows 7 x64, Python 3.8.0, selenium 4.1.3, Google Chrome 107.0.5304.107, website for property rent.

Scroll to an element: Find the element and scroll using this code.
scroll_element = driver.find_element(By.XPATH, "your element xpath")
driver.execute_script("arguments[0].scrollIntoView();", scroll_element)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scroll down to end page using Selenium Python chromeDriver - python

your script for that driver.execute_script("window.scrollTo(0, document.body.scrollHeight)") you can set no. of scrolls needed to get full length scrolls = 4 while True: scrolls -= 1 driver.execute_script("window.scrollTo(0, document.body.scrollHeight)") time.sleep(3) if scrolls < 0: break

Related

Once reached last page, Python Selenium Script to continue instead of wait indefinitely

Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

Python Selenium: Scrolling not working

Selenium Webdriver performance versus time

How can I scroll a web page using selenium webdriver in python?

Categories

Resources