Selenium doesn't read the second page - python

By default page 1 will open. I am clicking on "next page" using mores.click(), which is opening properly in the browser. But when I try to read the html code, it is still the first page. How do I make sure that I read the second page.
This is my code:
driver = webdriver.Firefox()
driver.get('https://colleges.niche.com/stanford-university/reviews/')
mores = driver.find_element_by_class_name('icon-arrowright-thin--pagination')
mores.click()
vkl = driver.page_source
print vkl

You are probably doing it too quick. Add some wait after your click and make sure that the second page is actually appearing on the screen before you try to read the source html.
Keep in mind that Selenium will not automatically wait for the second page to load completely or at all. It will perform the next command: driver.page_source immediately.

Related

How to get new page source after navigating Python Selenium

I am facing an issue.
I navigate on the page via Selenium Chrome. I have timeouts and WebDriverWait as I need a full page to get JSON out of it.
Then I click the navigation button with
driver.execute_script("arguments[0].click();", element)
as normal click never worked.
And it is navigating OK, I see Selenium is surfing normally. No problem.
But the driver.page_source remains for the first page that I got via 'get' method
All timeouts are the same as for the first page. And I see those new pages normally, but the page_source never updates.
What am I doing wrong?
After navigating to the new Page, you need to get the current URL by:
url = driver.current_url()
and then:
driver.get(url)
driver.getPageSource()

Getting to the last page in a website when the 'go to last page' button doesn't work

I need to get to the last page in the following site link. This is because when I right click on a row I would be able to export all previous rows along with it as a csv file. That way, I can download the complete data present in the website.
But the problem is that the Go to last page option doesn't work. So I am currently using selenium to click through the next page button to eventually reach the last page. But it takes a lot of time.
This is the code I am currently using.
from selenium import webdriver
import time
url = "https://mahabocw.in/essential-kit-benefits-distribution/"
driver = webdriver.Chrome()
driver.get(url)
for i in range(1000000):
next_button = '/html/body/div[1]/div[6]/div/article/div/div/div/div/div[2]/div/div/div[2]/div/div[4]/span[2]/div[3]/button'
click_next = driver.find_element_by_xpath(next_button)
click_next.click()
Is there any way I could modify the code in the website and maybe make the particular button : Go to last page to work so I can go to that page and download all the data. Or is there some better technique I could adopt using Selenium.
Any help or suggestions would be really helpful. Thanks a lot in advance.

How to use find_elements_by_tag_name in selenium for all loaded content on browser?

I try to use Selenium for my automation project.
In this case I realize find_element_by_tag_name() function returns the elements which display on browser so I basically used to PAGE_DOWN and run the function again.
The question is there any way to run find_elements_by_tag_name on whole loaded content in Selenium without scrolling down to page.
For example in my case I use this;
browser = webdriver.Chrome()
browser.get(url)
images = browser.find_elements_by_tag_name("img")
send_keys(Keys.PAGE_DOWN)
I don't want to send PAGE_DOWN because I already have whole page in the browser.
Note: browser.page_source is not solution.
Thanks for helps.

How to use driver.current_url on a new tab opened by .click() on Selenium for Python

I am writing a python script that uses BeautifulSoup to web scrape and then Selenium to navigate sites. After navigating to another site using the .click() on a link I want to use .current_url to get the site url to use for beautiful soup. The problem is that the .click() opens the link in a new tab so when I use current_url I get the url of the original site.
I tried using:
second_driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)
to change tabs but it has no effect on the current_url function so it didn't work.
Simplified code:
second_driver = webdriver.Firefox()
second_driver.get(<the original url>)
time.sleep(1)
second_driver.find_element_by_class_name(<html for button that opens new tab >).click()
time.sleep(1)
url=second_driver.current_url
So I want url to be the new site after click not the original url
Thank you and sorry if its obvious I am a beginner.
You'll need to switch to the new tab. webrdiver.window_handles is a list of open tabs/windows. This snippet will switch you to the second open handle.
If you want to go back to where you started, use [0]. If you want to always go to the last tab opened, use [-1]. If you try to switch to window_handles[1] before it exists, you'll raise an IndexError.
webdriver.switch_to_window(webdriver.window_handles[1])

Refreshing DOM so Selenium Web Driver can find element

I'm trying to use Selenium's Chrome web driver to navigate to a page and then fill out a form. The problem is that the page loads and then 5 seconds later displays the form. So JavaScript changes the DOM after 5 seconds. I think this means that the form's html id doesn't exist in the source code the web driver receives.
This is what the form looks like with Chrome's inspect feature:
However that html doesn't appear in the page's source html.
Python used to find the element:
answerBox = driver.find_element_by_xpath("//form[#id='answer0problem2']")
How would I access the input field within this form?
Is there a way to refresh the web driver without changing the page?
You're running into this problem because you didn't give the website enough time to load.
use time.sleep() like this:
import time
driver.get('http://your.website.com')
time.sleep(15)
plain_text = driver.page_source
soup = BeautifulSoup(plain_text, 'lxml')
This works because selenium spawns it's own process and is not affected by the python sleep. During this sleep time the headless browser keeps working and loads the website.
It's helpful to implement a sleep time for each selenium executions to account for page load. Because the only way the python process communicate to selenium is when you call driver, calling before page load can have consequences like the one you described.

Categories