How to prevent page updates after load with Python Selenium Webdriver (Firefox)

How to prevent page updates after load with Python Selenium Webdriver (Firefox) - python

I'm using Python Selenium to save data into a spreadsheet from a webpage using Firefox, but the page continually updates data causing errors relating to stale elements. How do I resolve this?
I've tried to turn off JavaScript but that's doesn't seem to do anything. Any suggestions would be great!

If you want to save the data at the page in the specific moment of time you can
get the current page HTML source using WebDriver.page_source function
write it into a file
open the file from the disk using WebDriver.get() function
that's it, you should be able to work with the local copy of the page which will never change
Example code:
driver.get("http://seleniumhq.org")
with open("mypage.html", "w") as mypage:
mypage.write(driver.page_source)
mypage.close()
driver.get(os.getcwd() + "/" + (mypage.name))
#do what you need with the page source
another approach is using WebDriver.find_element function wherever you need to interact with the element.
so instead of
myelement = driver.find_element_by_xpath("//your_selector")
# some other action
myelement.getAttribute("interestingAttribute")
perform find any time you need to interact with the element:
driver.find_element_by_xpath("//your_selector").getAttribute("interestingAttribute")
or even better go for Explicit Wait of the element you need:
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, "//your/selector"))).get_attribute("href")

Related

Iterating google search results using python selenium

I want to iterate clicking the google search results and copy menus of each site. So far, i am through copying the menus and returning back to the results page but couldn't iterate clicking the results.For now, i would like to learn iterating search results alone but I'm stuck at stale element reference exception, i did see few other sources but no luck.
from selenium import webdriver
chrome_path = r"C:\Users\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://www.google.com?q=python#q=python')
weblinks = driver.find_elements_by_xpath("//div[#class='g']//a[not(#class)]");
for links in weblinks[0:9]:
links.get_attribute("href")
print(links.get_attribute("href"))
links.click()
driver.back()

StaleElementReferenceException means that elements you are referring to do not exist anymore. That usually happens when page is automatically redrawn. In your case, you change the page and navigate back, so elements would be redrawn 100%.
Default solution is to search the list inside the loop every time.
If you want to be sure that list is same every iteration, you need to add some additional check (compare texts, etc.)
If you use this code for scraping, probably you don't need back navigation. Just open every page directly with driver.get(href)
Here you can find code example: How to open a link in new tab (chrome) using Selenium WebDriver?

How to use find_elements_by_tag_name in selenium for all loaded content on browser?

I try to use Selenium for my automation project.
In this case I realize find_element_by_tag_name() function returns the elements which display on browser so I basically used to PAGE_DOWN and run the function again.
The question is there any way to run find_elements_by_tag_name on whole loaded content in Selenium without scrolling down to page.
For example in my case I use this;
browser = webdriver.Chrome()
browser.get(url)
images = browser.find_elements_by_tag_name("img")
send_keys(Keys.PAGE_DOWN)
I don't want to send PAGE_DOWN because I already have whole page in the browser.
Note: browser.page_source is not solution.
Thanks for helps.

Get HTML-source as an HTML object with ability to work in it using DOM operations

I have a page, say, https://jq.profinance.ru/html/htmlquotes/site2.jsp, which is updated every second. My aim is to parse values using Selenium.
driver = webdriver.Chrome()
driver.get(url)
mylist = []
my_tables = driver.find_elements_by_tag_name('table') #operation1
for tr in my_tables.find_elements_by_tag_name('tr'): #operation2
mylist.append(tr)
The problem is that Python assigns a reference to object driver.find_elements_by_tag_name('table') to my variable my_tables but not value. Hence, I do not get correct data as there is some lag between operations 1 and 2.
How can I copy the webpage HTML structure and then use Selenium commands to walk through the structure of my document?
I tried pickle, get_aatribute("InnerHTML"), .page_source but they do not work properly as they copy the string object.

I don't think you can do exactly what you're trying to do with Selenium alone. Selenium "drives" a running web browser, and if the Javascript in that browser is updating the contents of the page every second or so you'll have these timing problems.
What you can do is use Selenium to drive the browser to get a snapshot of the page's HTML as a string (exactly as you describe in your last paragraph).
Then you can use a library like Beautiful Soup to parse the HTML string and extract the data that you need.

After some time I found the solution:
Dump file into string and save locally in a html file
Open html file locally.
If you want to get back to the website, write driver.back()

Selenium: How to get current url of a tab without switching to it?

I often open hundreds of tabs when using web browsers, and this slows my computer. So I want to write a browser manager in Python and Selenium , which opens tabs and can save the urls of those tabs, then I can reopen them later.
But it seems like the only way to get the url of a tab in Python Selenium is calling get_current_url.
I'm wondering if there's a way to get the url of a tab without switching to it?

There is no other way to get the specific tab titles of the browser without switching to the specific TAB as Selenium needs focus on the DOM Tree to perform any operation.

Just go to the text link which is switching to other tab and save its #href attribute link into a string or list

I am not sure about your actual scenario but we can get the list of all hyperlinks present in the current page. The idea is to collect all web elements with tag "a" and later get their "href" attribute value. Below is a sample code in Java. Kindly modify it accordingly.
//Collecting all hyperlink elements
List<WebElement> allLinks = driver.findElements(By.tagName("a"));
//For each Hyperlink element getting its target href value
for(WebElement link: allLinks)
{
System.out.println(link.getAttribute("href"));
}
Hope this helps you.
Thanks.

My recommendation is to use an extension for that or write/extend your own.There seem to be some of those types like
https://addons.mozilla.org/en-US/firefox/addon/export-tabs-urls-and-titles/
or
https://chrome.google.com/webstore/detail/save-all-tab-urls/bgjfbcjoaghcfdhnnnnaofkjbnelkkcm?hl=en-GB
To my kowledge, there is no way of getting/accessing an url of a webpage without first switching to it.

Selenium Get jsp generated Page Source

I have a website that populate its content using JSP.
I am trying to use Selenium to scrape the content there. When I opened up the page, I put a time sleep a few seconds and wait until the page fully loaded(I can see the data finished populating by eye-balling).
However, when I do browser.find_elements_by_class... I cannot find any element! I don't know how can I solve that issue in Selenium.

Check and see if your elements are inside of a frame or iframe.
http://selenium-python.readthedocs.org/en/latest/navigating.html#moving-between-windows-and-frames documents the python version, which turns out to be:
driver.switch_to_frame("framename")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to prevent page updates after load with Python Selenium Webdriver (Firefox) - python

I'm using Python Selenium to save data into a spreadsheet from a webpage using Firefox, but the page continually updates data causing errors relating to stale elements. How do I resolve this? I've tried to turn off JavaScript but that's doesn't seem to do anything. Any suggestions would be great!

Related

Iterating google search results using python selenium

How to use find_elements_by_tag_name in selenium for all loaded content on browser?

Get HTML-source as an HTML object with ability to work in it using DOM operations

Selenium: How to get current url of a tab without switching to it?

Selenium Get jsp generated Page Source

Categories

Resources