Locating Lazy Load Elements While Scrolling in PhantomJS in Python - python

I'm using python and Webdriver to scrape data from a page that dynamically loads content as the user scrolls down the page (lazy load). I have a total of 30 data elements, while only 15 are displayed without first scrolling down.
I am locating my elements, and getting their values in the following way, after scrolling to the bottom of the page multiple times until each element has loaded:
# Get All Data Items
all_data = self.driver.find_elements_by_css_selector('div[some-attribute="some-attribute-value"]')
# Iterate Through Each Item, Get Value
data_value_list = []
for d in all_data:
# Get Value for Each Data item
data_value = d.find_element_by_css_selector('div[class="target-class"]').get_attribute('target-attribute')
#Save Data Value to List
data_value_list.append(data_value)
When I execute the above code using ChromeDriver, while leaving the browser window up on my screen, I get all 30 data values to populate my data_value_list. When I execute the above code using ChromeDriver, with the window minimized, my list data_value_list is only populated with the initial 15 data values.
The same issue occurs while using PhantomJS, limiting my data_value_list to only the initially-visible data values on the page.
Is there away to load these types of elements while having the browser minimized and, ideally—while utilizing PhantomJS?
NOTE: I'm using an action chain to scroll down using the following approach .send_keys(Keys.PAGE_DOWN).perform() for a calculated number of times.

I had the exact same issue. The solution I found was to execute javascript code in the virtual browser to force elements to scroll to the bottom.
Before putting the Javascript command into selenium, I recommend opening up your page in Firefox and inspecting the elements to find the scrollable content. The element should encompass all of the dynamic rows, but it should not include the scrollbar Then, after selecting the element with javascript, you can scroll it to the bottom by setting its scrollTop attribute to its scrollHeight attribute.
Then, you will need to test scrolling the content in the browser. The easiest way to select the element is by ID if the element has an id, but other ways will work. To select an element with the id "scrollableContent" and scroll it to the bottom, execute the following code in your browser's javascript console:
e = document.getElementById('scrollableContent'); e.scrollTop = e.scrollHeight;
Of course, this will only scroll the content to the current top, you will need to repeat this after new content loads if you need to scroll multiple times. Also, I have no way of figuring out how to find the exact element, for me it is trial and error.
This is some code I tried out. However, I feel it can be improved, and should be for applications that are intended to test code or scrape unpredictably. I couldn't figure out how to explicitly wait until more elements were loaded (maybe get the number of elements, scroll to the bottom, then wait for subelement + 1 to show up, and if they don't exit the loop), so I hardcoded 5 scroll events and used time.sleep. time.sleep is ugly and can lead to issues, partly because it depends on the speed of your machine.
def scrollElementToBottom(driver, element_id):
time.sleep(.2)
for i in range(5):
driver.execute_script("e = document.getElementById('" + element_id + "'); e.scrollTop = e.scrollHeight;")
time.sleep(.2)
The caveat is that the following solution worked with the Firefox driver, but I see no reason why it shouldn't work with your setup.

Related

Determining the end of a web page

I am trying to automate scrolling down a web page written in react native and taking a screenshot of the entire thing. I've solved that by sending PAGE_DOWN via send_keys.
I am trying to find the end of the page so I know when to stop taking screenshots. My problem is that the page is dynamic in length depending on the information displayed. It has collapsible sections that are expanded all at once. To make it more fun, the dev team decided not to add ids or any unique identifiers because "it's written in react".
I've tried the following:
Looking for an element at the bottom of the page: the element is 'visible' regardless of where it is on the page. I've tried different ones with the same result
Determining the clientHeight, offsetHeight, scrollHeight via javascript: the number it returns doesn't change no matter how many times the page has been moved down, so either I'm not using it right or it won't work. I'm at a loss right now.
I'm running python with selenium on a Chrome browser (hoping that the solution can be translated to IE).
You can keep taking the Y coordinate of the vertical scrolling bar element each time you performing the scrolling down.
While it keeps changing - you still not reached the page bottom.
Once the page bottom is reached the previous value of vertical scrolling bar will be equal to the current Y coordinate of that element.
One of the way I know in Selenium Python bindings :
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

Scrolling an element which loads further on scrolling using Selenium in Python

I am trying to create an Instagram scraping bot that collects a list of Followers and Following using Python + Selenium.
However, that list keeps on loading when the user scrolls until the list is exhausted. I am attaching a screenshot for reference (some content hidden due to privacy reasons):
Now, I believe I have two ways to achieve this:
Keep reading the usernames, and then keep on scrolling.
Keep scrolling till the end, and then read all usernames together from the source code.
I've been trying to figure this out using the second method. However, I am not able to figure out how to know when there is no more content to scroll. How can I achieve this (provided that I don't know anything about the length of this element)?
Reason for not using Method 1: When scrolling, the DOM keeps getting refreshed, so it is hard to keep track of which usernames have been read.
One way to do this is is to keep track of the amount of child elements the div has that contains the li elements for the followers. If it doesn't increase after a scroll event, you've reached the end of the list.

Find all elements on a web page using Selenium and Python

I am trying to go through a webpage with Selenium and create a set of all elements with certain class names, so I have been using:
elements = set(driver.find_elements_by_class_name('class name'))
However, in some cases there are thousands of elements on the page (if I scroll down), and I've noticed that this code only finds the first 18-20 elements on the page (only about 14-16 are visible to me at once). Do I need to scroll, or am I doing something else wrong? Is there any way to instantaneously get all of the elements I want in the HTML into a list without having to visually see them on the screen?
It depends on your webpage. Just look at the HTML source code (or the network log), before you scroll down. If there are just the 18-20 elements then the page lazy load the next items (e.g. Twitter or Instagram). This means, the server just renders the next items if you reached a certain point on the webpage. Otherwise all thousand items would be loaded, which would increase the page size, loading time and server load.
In this case, you have to scroll down until the end and then get the source code to parse all items.
Probably you can use more advanced methods like dealing with each chunk as a kind of page for a pagination method (e.g. not saying "go to next page" but saying "scroll down"). But I guess you're a beginner, so I would start with simple scrolling down to the end (e.g. scroll, waiting, scroll,... until there are no new elements), then fetching the HTML and then parsing it.

Iterating google search results using python selenium

I want to iterate clicking the google search results and copy menus of each site. So far, i am through copying the menus and returning back to the results page but couldn't iterate clicking the results.For now, i would like to learn iterating search results alone but I'm stuck at stale element reference exception, i did see few other sources but no luck.
from selenium import webdriver
chrome_path = r"C:\Users\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://www.google.com?q=python#q=python')
weblinks = driver.find_elements_by_xpath("//div[#class='g']//a[not(#class)]");
for links in weblinks[0:9]:
links.get_attribute("href")
print(links.get_attribute("href"))
links.click()
driver.back()
StaleElementReferenceException means that elements you are referring to do not exist anymore. That usually happens when page is automatically redrawn. In your case, you change the page and navigate back, so elements would be redrawn 100%.
Default solution is to search the list inside the loop every time.
If you want to be sure that list is same every iteration, you need to add some additional check (compare texts, etc.)
If you use this code for scraping, probably you don't need back navigation. Just open every page directly with driver.get(href)
Here you can find code example: How to open a link in new tab (chrome) using Selenium WebDriver?

Using (Python) Webdriver to select text without using an element (i.e. click and drag to highlight from one set of coordinates to another set)

I am trying to select some text (i.e. highlight it with the mouse cursor) for an automated test. I would like to use Python and webdriver to go to this url: http://en.wikipedia.org/wiki/WebDriver#Selenium_WebDriver and highlight the second sentence under the heading 'Selenium WebDriver' ("Selenium WebDriver accepts commands (sent in Selenese, or via a Client API) and sends them to a browser.")
The tricky thing is that I was hoping that this could be done without using any elements, and I've been trying to work out a way to click at a location specified by x and y coordinates and then hold moving to another location, specified by a different set of x and y coordinates.
From reading around, I understand that it is not possible to just click on an area of the page by coordinates as you need to specify an element, so can the text selection be done only using only a single, remote element (let's say ".mw-editsection>a")? I was thinking it might be possible be able to do it by using the element as a reference and clicking a certain distance away from it (i.e. click by offset).
This is what I've attempted so far, but it's not doing the job:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Firefox()
actions = ActionChains(driver)
driver.get("http://en.wikipedia.org/wiki/WebDriver#Selenium_WebDriver")
the_only_element = ".mw-editsection>a"
element = driver.find_element_by_css_selector(the_only_element)
actions.move_to_element_with_offset(element,50,50)
actions.click_and_hold(on_element=None)
actions.move_by_offset(50, 50)
actions.release()
actions.perform()
From this, I get this error:
WebDriverException: Message: u"'UnknownError: Cannot press more then one button or an already pressed button.' when calling method: [wdIMouse::down]"
Background:
I appreciate that the above example is a bit contrived, but I can't actually provide you with the thing I'm really trying to test. What I'm actually doing is writing a series of tests in Python using webdriver to test our document viewer, and I really need to be able to highlight a row of text as this is how you add comments on our system. Unfortunately, the document viewer doesn't actually show the submitted document, only an image of it via some kind of javascript wizardry. The document page is an element, but there are no elements for webdriver to click on within the page itself.
Because of this, I want to be able to click and hold on a location on the page by specifying coordinates (at the start of the sentence), keep the mouse button held while the mouse curser is simulated moving to the right to a second set of coordinates (at the end of the sentence), and then release the button.
TL;DR:
Is it possible to click and drag from an arbitrary point on a webpage to another, without using an element (other than to act as a reference from which the arbitrary point is defined as being offset from)?
If not, what other method could you suggest to highlight an area of text and could you provide a working example?
Thanks!
Have you tried with drag_and_drop_by_offset ?
actions.drag_and_drop_by_offset(element, 50, 50)
actions.perform()
I think I've worked it out- The following does seem to work, but I can't seem to get it to let go! I think this might be down to a buggy implementation of .release() in the Python bindings for Webdriver:
def click_and_drag(locator, x_from, y_from, x_to, y_to):
element = driver.find_element_by_css_selector(locator)
actions.move_to_element(element)
actions.move_by_offset(x_from, y_from)
actions.click_and_hold(on_element=None)
actions.move_by_offset(x_to, y_to)
actions.release(on_element=None)
actions.perform()

Categories