Web-scraping with Python: How to scroll into a view by pixels?

Web-scraping with Python: How to scroll into a view by pixels? - python

I'm using Python and Selenium. I'm looking to scroll inside a view by pixels and not elements. The point is to loop until I've scroll until the end of the list. As a training, I've been trying to scroll all the list of people having liked this instagram post: https://www.instagram.com/p/BuT_u-UAKn1/ . I know how to scroll by elements:
elements = driver.find_elements_by_xpath("//*[#id]/div/a")
driver.execute_script("return arguments[0].scrollIntoView();", elements[-1])
But I would like to scroll by pixels. I've tried to do the following:
driver.execute_script("return arguments[0].scrollIntoView(true);", elements)
driver.execute_script("window.scrollBy(0,200);")
When doing so, this error occurs:
JavascriptException: Message: TypeError: arguments[0].scrollIntoView is not a function
Anyone knows how to scroll into a view by pixels?
Thanks

Below has worked for me.
#first move to the element
self.driver.execute_script("return arguments[0].scrollIntoView(true);", element)
#then scroll by x, y values, in this case 10 pixels up
self.driver.execute_script("window.scrollBy(0, -10);")
When you say scroll by (0,200). The positive number means scroll DOWN. If you want to scroll UP, use the negative -200
Also see the documentation here: https://developer.mozilla.org/en-US/docs/Web/API/Window/scrollBy
If you are using a browser that does not support scrollToOptions then switch to a better more supported browser.
Another possible solution is to implement a webDriverWait for the specific element to be visible in the HTML DOM
element = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "element_css")))
self.driver.execute_script("return arguments[0].scrollIntoView(true);", element)
Also you an try using ActionChains
element = driver.find_element_by_id("id") # the element you want to scroll to
ActionChains(driver).move_to_element(element).perform()
After you move to the element, then you can use the scroll code
You can also try adding in an offset. Some webpages will not load new content if you scroll all the way down to the bottom. Some webpages only load new content as you reach the end of the page.
document.documentElement.scrollHeight-10
A less conventional way would be to execute javascript within your code.
Also try maximizing your window with selenium. Sometimes the size of the window effects the operation of Selenium
driver.maximize_window()
findThis = driver.find_element_by_css_selector("CSS SELECTOR HERE")
jsScript = """
function move_up(element) {
element.scrollTop = element.scrollTop - 1000;
}
function move_down(element) {
console.log('Position before: ' + element.scrollTop);
element.scrollTop = element.scrollTop + 1000;
console.log('Position after: ' + element.scrollTop);
}
move_up(arguments[0]);
"""
driver.execute_script(jsScript, findThis)

Related

Python Selenium scroll down page and get updated rows

I'm using selenium to parse a webpage that initially shows ~15 rows and then, for each scroll, the content of such rows is replaced. That is, when you reach the end of the page and you inspect the page, you will still find those same 15 divs (who share a CSS class), but their content has dinamically changed depending on the height which you were currently in.
My relevant block of code is as follows:
content_list = []
# Scroll to the bottom of the page
old_height = driver.execute_script(current_scrollbar_height)
while True:
elems = driver.find_elements(By.CLASS_NAME, class_)
for e in elems:
content_list.append(e.text.replace("\n", ";"))
driver.execute_script("window.scrollBy(0, 100);")
new_height = driver.execute_script(current_scrollbar_height)
if new_height == old_height:
break
else:
old_height = new_height
I've tried, without results
selecting the parent_div first, using its find_elements() to get the inner rows, scrolling down and then reassigning it with its same selector
sleeping after a scroll down
setting an implicitly_wait() time
using an ActionChain to scroll
Worth noting is that if I manually scroll down and then only call find_elements() the content of the rows does change at different heights.
EDIT
I figured out what's the issue: if the browser window is left unfocused (i.e. not visible on screen) the page scrolls but the content doesn't update. If I leave the browser window shown, it scrolls and updates.
So, the most natural workaround was to call maximize_window() so that the window stays on top until the script exits.
But... isn't there a better solution?

How To Scroll Inside An Element On A Webpage (Selenium Python)

How can I scroll down in a certain element of a webpage in Selenium?
Basically my goal is to scroll down in this element until new profile results stop loading.
Let's say that there should be 100 profile results that I'm trying to gather.
By default, the webpage will load 30 results.
I need to scroll down IN THIS SECTION, wait a few seconds for 30 more results to load, repeat (until all results have loaded).
I am able to count the number of results with:
len(driver.find_elements(By.XPATH, "//div[#class='virtual-box']"))
I already have all the other code written, I just need to figure out the line of code to get Selenium to scroll down like 2 inches.
I've looked around a bunch and can't seem to find a good answer (that or I suck at googling).
This is a section of my code:
(getting the total number of profiles currently on the page = max_prof)
while new_max_prof > max_prof:
scroll_and_wait(profile_number)
if max_prof != new_max_prof: # to make sure that they are the same
max_prof = new_max_prof
...and here is the function that it is calling (which currently doesn't work because I can't get it to scroll)
def scroll_and_wait(profile_number=profile_number): # This doesn't work yet
global profile_xpath
global new_max_prof
global max_prof
print('scrolling!')
#driver.execute_script("window.scrollTo(0,1080);") # does not work
temp_xpath = profile_xpath + str(max_prof) + ']'
element = driver.find_element(By.XPATH, temp_xpath)
ActionChains(driver).scroll_to_element(element).perform() # scrolls to the last profile
element.click() # selects the last profile
# Tested and this does not seem to load the new profiles unless you scroll down.
print('did the scroll!!!')
time.sleep(5)
new_max_prof = int(len(driver.find_elements(By.XPATH, "//div[#class='virtual-box']")))
print('new max prof is: ' + str(new_max_prof))
time.sleep(4)
I tried:
#1. driver.execute_script("window.scrollTo(0,1080);") and driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")``` but neither seemed to do anything.
#2. ActionChains(driver).scroll_to_element(element).perform() hoping that if I scrolled to the last profile on the page, it would load the next one (it doesn't)
#3. Using pywin32 win32api.mouse_event(MOUSEEVENTF_WHEEL, -300, 0) to simulate the mouse scrolling. Didn't seem to work, but even if it did, I'm not sure this would solve it because it would really need to be in the element of the webpage. Not just going to the bottom of the webpage.

OKAY! I found something that works. (If anyone knows a better solution please let me know)
You can use this code to scroll to the bottom of the page:
driver.find_element(By.TAG_NAME, 'html').send_keys(Keys.END) # works, but not inside element.
What I had to do was more complicated though (since I am trying to scroll down IN AN ELEMENT on the page, and not just to the bottom of the page).
IF YOUR SCROLL BAR HAS ARROW BUTTONS at the top/buttons, try just clicking them with .click() or .click_and_hold() that's a much easier solution that trying to scroll and does the same thing.
IF, LIKE ME, YOUR SCROLL BAR HAS NO ARROW BUTTONS, you can still click on the scroll bar path at the bottom/top and it will move. If you find the XPATH to your scrollbar, then click it, it will click in the middle (not helpful), but you can offset this on the x/y axis with ".move_by_offset(0, 0)" so for example:
# import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
scroll_bar_xpath = "//div[#ng-if='::vm.isVirtual']/div[#class='ps-scrollbar-y-rail']"
element = driver.find_element(By.XPATH, scroll_bar_xpath)
# Do stuff
ActionChains(driver).move_to_element(element).move_by_offset(0,50).click().perform()
Now normally, you wouldn't want to use a fixed pixel amount (50 on the y axis) because if you change the browser size, or run the program on a different monitor, it could mess up.
To solve this, you just need to figure out the size of the scroll bar, so that you know where the bottom of it is. All you have to do is:
element = driver.find_element(By.XPATH, scroll_bar_xpath)
size = element.size
w = size['width']
h = size['height']\
print('size is: ' + size)
print(h)
print(w)
This will give you the size of the element. You want to click at the bottom of it, so you'd thing that you can just take the height, and pass that into move_by_offset like this: ".move_by_offset(0,h)". You can't do that, because when you select an element, it starts from the middle, so you want to cut that number in half (and round it down so that you don't have a decimal.) This is what I ended up doing that worked:
# import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
import math
scroll_bar_xpath = "//div[#ng-if='::vm.isVirtual']/div[#class='ps-scrollbar-y-rail']"
element = driver.find_element(By.XPATH, scroll_bar_xpath)
size = element.size
w = size['width']
h = size['height']
#Calculate where to click
click_place = math.floor(h / 2)
# Do Stuff
ActionChains(driver).move_to_element(element).move_by_offset(0, click_place).click().perform() #50 worked
Hope it helps!

Scrolling down on Python Selenium

I am currently attempting to scrape a DropBox Folder using Selenium on Python. Apparently, if I try to select all hyperlinks (or all elements containing hyperlinks), I only get the first 20 or so results. To give a minimum working example:
from selenium import webdriver
browser = webdriver.Chrome()
page = www.dropbox.com/FolderName
browser.get(page)
elementlist = browser.find_elements_by_class_name('brws-file-name-cell-filename')
#or alternatively, you can simply use the 'by_tag_name('a') method, which yields similar results)
elength = len(elementlist)
Usually, elength is in the order of 20 to 30 elements, which grows to 30 to 40 I add a command to scroll down to the bottom of the page. I know for a fact that there are well over 200 elements in the folder I am trying to scrape. My question is, thus: is there any way to scroll down the page progressively, rather than going all the way to the bottom right away? I have seen that many questions asked on the same topic focus on pages with infinite loading, like Facebook or other social media. My page, on the other hand, has a fixed length. Is there a way I can scroll down step by step, rather than all at once?
UPDATE
I tried following the advice given to me by the community and by the answer you can find here. Unfortunately, I am still struggling to iterate over the height, which is my variable of interest and which seems to be stuck in a string. This has been my best attempt at creating a for loop over the height, and needless to say, it still did not work.
# Get current height
height = browser.execute_script("return document.body.scrollHeight")
while True:
# Scroll down
browser.execute_script('window.scrollTo(0, window.scroll'+str(height)+' + 200)')
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
if new_height == height:
break
else:
height = new_height
UPDATE 2
I think I've found the issue. Dropbox basically has a 'page within the page' structure. The whole of the page is visible to me, but there's an inner archive which I need to navigate. Any idea how to do that?

You could try this answer. Instead of going to the bottom, you could create a for loop with a fixed height and iterate till reach the bottom.

browser.execute_script('window.scrollTo(0, window.scroll'+str(height)+' + 200)')
The second argument inside Javascript method seems odd to me. Lets assume your height variable is 800px so we get this javascript function to execute inside execute_script(execute_script is a selenium method which lets you code javascript).
window.scrollTo(0, window.scroll800 + 200) and I assume this will throw an error and stop the execution. I think you should change your code to this.
browser.execute_script('window.scrollTo(0,'+str(height)+' + 200)')
This code will scroll your window to the bottom of the page(One tip: you can actually just go to devtools of your browser and open the console and try the javascript code there.If it works, you can come back to selenium). At this point you should make your driver instance sleep. Once it loads the page(make sure to give it enough time to load), you should assing the new height value to a new variable. If the page has loaded more elements at the bottom of the page, first height and new height values should be different and that requires another scroll to the bottom. But before scroll you should change the first height value and assign new height value to it so in the next loop your first height will be the second height from previous loop.

Can't get "WebDriver" element data if not "eye-visible" in browser using Selenium and Python

I'm doing a scraping with Selenium in Python. My problem is that after I found all the WebElements, I'm unable to get their info (id, text, etc) if the element is not really VISIBLE in the browser opened with Selenium.
What I mean is:
First image
Second image
As you can see from the first and second images, I have the first 4 "tables" that are "visible" for me and for the code. There are however, other 2 tables (5 & 6 Gettho lucky dip & Sue Specs) that are not "visible" until I drag down the right bar.
Here's what I get when I try to get the element info, without "seeing it" in the page:
Third image
Manually dragging the page to the bottom and therefore making it "visible" to the human eye (and also to the code ???) is the only way I can the data from the WebDriver element I need:
Fourth image
What am I missing ? Why Selenium can't do it in background ? Is there a manner to solve this problem without going up and down the page ?
PS: the page could be any kind of dog race page in http://greyhoundbet.racingpost.com/. Just click City - Time - and then FORM.
Here's part of my code:
# I call this function with the URL and it returns the driver object
def open_main_page(url):
chrome_path = r"c:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(url)
# Wait for page to load
loading(driver, "//*[#id='showLandingLADB']/h4/p", 0)
element = driver.find_element_by_xpath("//*[#id='showLandingLADB']/h4/p")
element.click()
# Wait for second element to load, after click
loading(driver, "//*[#id='landingLADBStart']", 0)
element = driver.find_element_by_xpath("//*[#id='landingLADBStart']")
element.click()
# Wait for main page to load.
loading(driver, "//*[#id='whRadio']", 0)
return driver
Now I have the browser "driver" which I can use to find the elements I want
url = "http://greyhoundbet.racingpost.com/#card/race_id=1640848&r_date=2018-
09-21&tab=form"
browser = open_main_page(url)
# Find dog names
names = []
text: str
tags = browser.find_elements_by_xpath("//strong")
Now "TAGS" is a list of WebDriver elements as in the figures.
I'm pretty new to this area.
UPDATE:
I've solved the problem with a code workaround.
tags = driver.find_elements_by_tag_name("strong")
for tag in tags:
driver.execute_script("arguments[0].scrollIntoView();", tag)
print(tag.text)
In this manner the browser will move to the element position and it will be able to get its information.
However I still have no idea why with this page in particular I'm not able to read webpages elements that are not visible in the Browser area untill I scroll and literally see them.

Selenium scroll to the bottom does not work properly

I'm trying to scroll to the bottom of the web page but it scrolls only once and stay on that position and there's a big part of the page left.
I use this: _inst.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Do you know where could be the problem?
EDIT: Is there a way to tell the selenium that it has to scroll to the absolute bottom of the page or that it should do the scroll certain amount of times? For example 5?

To scroll to the bottom of the page, you can send a CTRL+END to one of its elements:
from selenium.webdriver.common.keys import Keys
element = driver.find_element_by_ ...
element.send_keys(Keys.CONTROL , Keys.END)
To find the element, there are many options available (see here)
See here for more info
and these SO questions/answers:
first
second

2 easy ways:
hardcode so it goes all the way down for sure:
_inst.driver.execute_script("window.scrollTo(0, 10000);")
or find the location of an element in the bottom of the page and scroll to its location:
element = find_element('footer')
position = element.location[:y]
_inst.driver.execute_script("window.scrollTo(0, position);")

I tried this and it worked for me.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Web-scraping with Python: How to scroll into a view by pixels? - python

Related

Python Selenium scroll down page and get updated rows

How To Scroll Inside An Element On A Webpage (Selenium Python)

Scrolling down on Python Selenium

Can't get "WebDriver" element data if not "eye-visible" in browser using Selenium and Python

Selenium scroll to the bottom does not work properly

Categories

Resources