Scroll to the bottom of a web page - python

I'm trying to make a little script which takes a look at main page of web and finds adds.
The problem is that there are web pages which contains infinite scroll. If this code was built for particular web page, I could use locating elements and scrolling.
But I can't figure out how to make Selenium to scroll at the very bottom of any page?
self.driver.execute_script("window.scrollTo(0, something);")
PS: If there is very huge page, break it down after several seconds of scrolling.
Do you know how to do that?

Here's another method that i used for Java, get the window size and then scroll to that position using javascript. Here's how to do it in Java (hope you can implement the concept in python too) -
double pageHeight = testBase.TestBase.driver.manage().window().getSize().getHeight();
driver.executeScript("window.scrollBy(0,"+pageHeight+")");
If you are implementing an infinite scroll then you can put the executeScript() lines in a loop. Hope it helps.

Related

How can I click "invisible" reCAPTCHA buttons using Selenium web automation?

I am using Python and Selenium to automate this website: https://prenotami.esteri.it
The script I made fills out a form and then clicks a button to advance to the next page. These actions are carried out using Selenium's find_element_by_xpath() function. Recently, the website added a reCAPTCHA that pops up after the button is clicked, and must be completed before advancing.
I have already written a Python script that is capable of surpassing this type of captchas by using the audio option. However, in this particular website, I am not able to find the xpath to the audio button of the reCAPTCHA. Although there is an iframe that contains the reCAPTCHA, there seems not to be anything inside it.
In the first attached image you can see how this website's reCAPTCHA looks like in HTML, compared to other website that is visible in the second image, where a #document can be seen inside the iframe.
My intention is to run this program using headless Chrome, so I can't relay in any mouse control functions offered by pyautogui for example.
I've been scratching my head around this problem for a while, so any advice is useful. Thanks!
Edit: after some research I have found that this type of reCAPTCHA that doesn't need to check a "I am not a robot" checkbox is called "invisible reCAPTCHA". The captcha only pops up if the detected activity is suspicious (for example clicking too fast). I have tried adding random waits and movements to mimic human behaviour, but the captcha still appears after some tries. Since I don't think there is a way to avoid the captcha from appearing 100% of the times, the question of how to click the buttons using Selenium's find_element_by_xpath() function remains the same. Leaving this as a note just in case someone finds it useful.
Ever tried to use the following function:
add_argument("-auto-open-devtools-for-tabs")
I managed to interact with captcha
If the position is always fixed, you can use PyAutoGUI to move the mouse and click on it
import pyautogui
pyautogui.click(100, 100) # button coordinates
Since, it is in iframe, we need to move our selenium pointing to iframe and then use your xpath.
driver.switch_to.frame("c-la7g7xqfbit4")
capchaBtn = driver.find_element_by_xpath("(//button[#id='recaptcha-audio-button'])[2]")

why window.scrollTo() doesn't work with pages like YouTube?

I am trying to scrape with Selenium but I need to load all the content of the page by moving to the end of the website. But when I execute: driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
the program doesn't do anything at all. I wonder if it's because the page that I'm scraping has a personalized scrollbar.
That's because document.body.scrollHeight is zero, so it doesn't scroll anything.
You can scroll to an arbitrary large value to scroll down or instead use document.documentElement.scrollHeight.
Check this question for more details.

Determining the end of a web page

I am trying to automate scrolling down a web page written in react native and taking a screenshot of the entire thing. I've solved that by sending PAGE_DOWN via send_keys.
I am trying to find the end of the page so I know when to stop taking screenshots. My problem is that the page is dynamic in length depending on the information displayed. It has collapsible sections that are expanded all at once. To make it more fun, the dev team decided not to add ids or any unique identifiers because "it's written in react".
I've tried the following:
Looking for an element at the bottom of the page: the element is 'visible' regardless of where it is on the page. I've tried different ones with the same result
Determining the clientHeight, offsetHeight, scrollHeight via javascript: the number it returns doesn't change no matter how many times the page has been moved down, so either I'm not using it right or it won't work. I'm at a loss right now.
I'm running python with selenium on a Chrome browser (hoping that the solution can be translated to IE).
You can keep taking the Y coordinate of the vertical scrolling bar element each time you performing the scrolling down.
While it keeps changing - you still not reached the page bottom.
Once the page bottom is reached the previous value of vertical scrolling bar will be equal to the current Y coordinate of that element.
One of the way I know in Selenium Python bindings :
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

Extracting info from dynamic page element in Python without "clicking" to make visible?

For the life of me I can't think of a better title...
I have a Python WebDriver-based scraper that goes to Google, enters a local search such as chiropractors+new york+ny, which, after clicking on More chiropractors+New York+NY, ends up on a page like this
The goal of the scraper is to grab the phone number and full address (including suite# etc.) of each of the 20 results on such a results page. In order to do so, I need to have WebDriver click 20 entries needs to be clicked on the bring up an overlay over the Google Map:
This is mighty slow. Were it not having to trigger each of these overlays, I would be able to do everything up to that point with the much faster lxml, by going straight to the ultimate URL of the results page and then extracting via XPath. But I appear to be stuck with not being able to get data from the overlay without first clicking on the link that brings up the overlay.
Is there a way to get the data out of this page element without having to click the associated links?

Weird thing with selenium test (python) on angular js based UI

We are testing a web application based on angular js. I have encountered it twice now. this time I need to click a dropdown embedded in a link tag . I can manually manipulate it with ipython, but once it run in script, the dropdown popup will not appear as what I can do it with the terminal.
Do you have any idea about this?
Judging from your comment "there is no error actually, the element is clicked", then I would suspect that the script is running fast enough to click the element before the JavaScript actions have been bound to the event. You can verify this by adding a
import time
time.sleep(4)
If the action works when there is a deliberate pause then you can be quite sure that it is a race condition between the JavaScript being bound and Selenium clicking the element.
How you deal with this is up to you. You could mark the DOM in some way when the events had been bound. That's a technique I've used in the past.
You could execute a bit of JavaScript in a loop that returned some information about the global state of the page, and use that to decide when the page was ready to interact with.

Categories