I'm trying to make a webscraper for this website. The idea is that code iterates over all institutions by selecting the institution's name (3B-Wonen at first instance), closes the pop-up screen, clicks the download button, and does it all again for all items in the list.
However, after the first loop it throws the StaleElementReferenceException when selecting the second institution in the loop. From what I read about it this implies that the elements defined in the first loop are no longer accessible. I've read multiple posts but I've no idea to overcome this particular case.
Can anybody point me in the right directon? Btw, I'm using Pythons selenium and I'm quite a beginner in programming so I'm still learning. If you could point me in a general direction that would help me a lot! The code I have is te following:
#importing and setting up parameters for geckodriver/firefox
...
# webpage
driver.get("https://opendata-dashboard.cijfersoverwonen.nl/dashboard/opendata-dashboard/beleidswaarde")
WebDriverWait(driver, 30)
# Get rid of cookie notification
# driver.find_element_by_class_name("cc-compliance").click()
# Store position of download button
element_to_select = driver.find_element_by_id("utilsmenu")
action = ActionChains(driver)
WebDriverWait(driver, 30)
# Drop down menu
driver.find_element_by_id("baseGeo").click()
# Add institutions to array
corporaties=[]
corporaties = driver.find_elements_by_xpath("//button[#role='option']")
# Iteration
for i in corporaties:
i.click() # select institution
driver.find_element_by_class_name("close-button").click() # close pop-up screen
action.move_to_element(element_to_select).perform() # select download button
driver.find_element_by_id("utilsmenu").click() # click download button
driver.find_element_by_id("utils-export-spreadsheet").click() # pick export to excel
driver.find_element_by_id("baseGeo").click() # select drop down menu for next iteration
This code worked for me. But I am not doing driver.find_element_by_id("utils-export-spreadsheet").click()
from selenium import webdriver
import time
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path="path")
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://opendata-dashboard.cijfersoverwonen.nl/dashboard/opendata-dashboard/beleidswaarde")
act = ActionChains(driver)
driver.find_element_by_xpath("//a[text()='Sluiten en niet meer tonen']").click() # Close pop-up
# Get the count of options
driver.find_element_by_id("baseGeoContent").click()
cor_len = len(driver.find_elements_by_xpath("//button[contains(#class,'sel-listitem')]"))
print(cor_len)
driver.find_element_by_class_name("close-button").click()
# No need to start from 0, since 1st option is already selected. Start from downloading and then move to next items.
for i in range(1,cor_len-288): # Tried only for 5 items
act.move_to_element(driver.find_element_by_id("utilsmenu")).click().perform()
#Code to click on downloading option
print("Downloaded:{}".format(driver.find_element_by_id("baseGeoContent").get_attribute("innerText")))
driver.find_element_by_id("baseGeoContent").click()
time.sleep(3) # Takes time to load.
coritems = driver.find_elements_by_xpath("//button[contains(#class,'sel-listitem')]")
coritems[i].click()
driver.find_element_by_class_name("close-button").click()
driver.quit()
Output:
295
Downloaded:3B-Wonen
Downloaded:Acantus
Downloaded:Accolade
Downloaded:Actium
Downloaded:Almelose Woningstichting Beter Wonen
Downloaded:Alwel
Problem Explanation :
See the problem here is, that you have defined a list corporaties = driver.find_elements_by_xpath("//button[#role='option']") and then iterating this list, and clicking on first element, which may cause some redirection to a new page, or in a new tab etc.
so when Selenium try to interact with the second webelement from the same list, it has to come back to original page, and the moment it comes back, all the elements become stale in nature.
Solution :
one of the basic solution in this cases are to define the list again, so that element won't be stale. Please see the illustration below :-
Code :
corporaties=[]
corporaties = driver.find_elements_by_xpath("//button[#role='option']")
# Iteration
j = 0
for i in range(len(corporaties)):
elements = driver.find_elements_by_xpath("//button[#role='option']")
elements[j].click()
j = j + 1 # select institution
driver.find_element_by_class_name("close-button").click() # close pop-up screen
action.move_to_element(element_to_select).perform() # select download button
driver.find_element_by_id("utilsmenu").click() # click download button
driver.find_element_by_id("utils-export-spreadsheet").click() # pick export to excel
driver.find_element_by_id("baseGeo").click() # select drop down menu for next iteration
time.sleep(2)
Related
How can I scroll down in a certain element of a webpage in Selenium?
Basically my goal is to scroll down in this element until new profile results stop loading.
Let's say that there should be 100 profile results that I'm trying to gather.
By default, the webpage will load 30 results.
I need to scroll down IN THIS SECTION, wait a few seconds for 30 more results to load, repeat (until all results have loaded).
I am able to count the number of results with:
len(driver.find_elements(By.XPATH, "//div[#class='virtual-box']"))
I already have all the other code written, I just need to figure out the line of code to get Selenium to scroll down like 2 inches.
I've looked around a bunch and can't seem to find a good answer (that or I suck at googling).
This is a section of my code:
(getting the total number of profiles currently on the page = max_prof)
while new_max_prof > max_prof:
scroll_and_wait(profile_number)
if max_prof != new_max_prof: # to make sure that they are the same
max_prof = new_max_prof
...and here is the function that it is calling (which currently doesn't work because I can't get it to scroll)
def scroll_and_wait(profile_number=profile_number): # This doesn't work yet
global profile_xpath
global new_max_prof
global max_prof
print('scrolling!')
#driver.execute_script("window.scrollTo(0,1080);") # does not work
temp_xpath = profile_xpath + str(max_prof) + ']'
element = driver.find_element(By.XPATH, temp_xpath)
ActionChains(driver).scroll_to_element(element).perform() # scrolls to the last profile
element.click() # selects the last profile
# Tested and this does not seem to load the new profiles unless you scroll down.
print('did the scroll!!!')
time.sleep(5)
new_max_prof = int(len(driver.find_elements(By.XPATH, "//div[#class='virtual-box']")))
print('new max prof is: ' + str(new_max_prof))
time.sleep(4)
I tried:
#1. driver.execute_script("window.scrollTo(0,1080);") and driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")``` but neither seemed to do anything.
#2. ActionChains(driver).scroll_to_element(element).perform() hoping that if I scrolled to the last profile on the page, it would load the next one (it doesn't)
#3. Using pywin32 win32api.mouse_event(MOUSEEVENTF_WHEEL, -300, 0) to simulate the mouse scrolling. Didn't seem to work, but even if it did, I'm not sure this would solve it because it would really need to be in the element of the webpage. Not just going to the bottom of the webpage.
OKAY! I found something that works. (If anyone knows a better solution please let me know)
You can use this code to scroll to the bottom of the page:
driver.find_element(By.TAG_NAME, 'html').send_keys(Keys.END) # works, but not inside element.
What I had to do was more complicated though (since I am trying to scroll down IN AN ELEMENT on the page, and not just to the bottom of the page).
IF YOUR SCROLL BAR HAS ARROW BUTTONS at the top/buttons, try just clicking them with .click() or .click_and_hold() that's a much easier solution that trying to scroll and does the same thing.
IF, LIKE ME, YOUR SCROLL BAR HAS NO ARROW BUTTONS, you can still click on the scroll bar path at the bottom/top and it will move. If you find the XPATH to your scrollbar, then click it, it will click in the middle (not helpful), but you can offset this on the x/y axis with ".move_by_offset(0, 0)" so for example:
# import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
scroll_bar_xpath = "//div[#ng-if='::vm.isVirtual']/div[#class='ps-scrollbar-y-rail']"
element = driver.find_element(By.XPATH, scroll_bar_xpath)
# Do stuff
ActionChains(driver).move_to_element(element).move_by_offset(0,50).click().perform()
Now normally, you wouldn't want to use a fixed pixel amount (50 on the y axis) because if you change the browser size, or run the program on a different monitor, it could mess up.
To solve this, you just need to figure out the size of the scroll bar, so that you know where the bottom of it is. All you have to do is:
element = driver.find_element(By.XPATH, scroll_bar_xpath)
size = element.size
w = size['width']
h = size['height']\
print('size is: ' + size)
print(h)
print(w)
This will give you the size of the element. You want to click at the bottom of it, so you'd thing that you can just take the height, and pass that into move_by_offset like this: ".move_by_offset(0,h)". You can't do that, because when you select an element, it starts from the middle, so you want to cut that number in half (and round it down so that you don't have a decimal.) This is what I ended up doing that worked:
# import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
import math
scroll_bar_xpath = "//div[#ng-if='::vm.isVirtual']/div[#class='ps-scrollbar-y-rail']"
element = driver.find_element(By.XPATH, scroll_bar_xpath)
size = element.size
w = size['width']
h = size['height']
#Calculate where to click
click_place = math.floor(h / 2)
# Do Stuff
ActionChains(driver).move_to_element(element).move_by_offset(0, click_place).click().perform() #50 worked
Hope it helps!
I am trying to scrape the terms and definitions, using the selenium chrome driver in python, from this website here: https://quizlet.com/433328443/ap-us-history-flash-cards/. There are 533 terms...so many in fact that quizlet makes you click a See more button if you want to see all the terms. The following code successfully extracts terms and definitions (I have tested it on other quizlet sites with less terms). There are also if() statements to deal with popups and the See more button. Again, my goal is to get the terms and definitions for every single term-definition pair on the page; however, to do this, the entire page needs to be loaded in, which is the basis of my problem.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path = chrome_driver_path)
driver.get("https://quizlet.com/433328443/ap-us-history-flash-cards/")
# INCASE OF POPUP, CLICK AWAY
if len(driver.find_elements_by_xpath("//button[#class='UILink UILink--revert']")) > 0:
popup = driver.find_element_by_xpath("//button[#class='UILink UILink--revert']")
popup.click()
del popup
# SCROLL TO BOTTOM TO LOAD IN ALL TERMS, AND THEN BACK TO THE TOP
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# INCASE OF "SEE MORE" BUTTON AT BOTTOM, CLICK IT
if len(driver.find_elements_by_xpath("//button[#class='UIButton UIButton--fill' and #aria-label='See more']")) > 0:
see_more = driver.find_element_by_xpath("//button[#class='UIButton UIButton--fill' and #aria-label='See more']")
see_more.click()
del see_more
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# list of terms
quizlet_terms = tuple(map(lambda a: a.text,
driver.find_elements_by_class_name("SetPageTerm-wordText")))
# list of definitions
quizlet_definitions = tuple(map(lambda a: a.text,
driver.find_elements_by_class_name("SetPageTerm-definitionText")))
In my code, I have tried the scrolling down trick to load in everything, but this does not work. This is because as I scroll down, while terms in my browser window are loaded, terms above and below my browser window get unloaded. Obviously, this is done for memory reasons, but I do not care about memory and I just want for all the terms to be loaded at once so I can access their contents. My code works on smaller quizlet sites (with say 100 terms), but it breaks on this site, generating the following error:
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
This stackoverflow page explains the error message: Python with Selenium "element is not attached to the page document".
From reading the aforementioned page, I have come to the conclusion that because the website is so large, as I scroll down the quizlet page, the terms I am currently looking at in my browser window are loaded, but terms that I have scrolled past and are no longer in my view are unloaded and stored in some funky way that I cannot properly access, generating the error message.
How would one go about in keeping the entirety of the page loaded-in so I can access the contents of all 533 terms? Ideally, I would like a solution that keeps everything I have scrolled past fully-loaded in, and does not unload anything. Another idea is that the whole page is loaded in from the get-go. It would also be nice if there is some memory-saving solution to this, perhaps by simply accessing just the raw html code and no fancy graphics or anything. Has anyone ever encountered this problem, and if so, how did you solve it? Thank you, any help is appreciated.
Much thanks to #Abhishek Dhoundiyal's comment. My working code:
driver.execute_script("window.scrollTo(800, 800);")
terms_in_this_set = int(sub("\D", "", (driver.find_element_by_xpath("//h4[#class='UIHeading UIHeading--assembly UIHeading--four']")).text))
chunk_size = 15000
quizlet = numpy.empty(shape = (0, 2), dtype = "str")
# done in while loop so that terms and definitions can be extracted while scrolling (while making sure there are no duplicate entries)
while len(quizlet) != terms_in_this_set:
# INCASE OF "SEE MORE" BUTTON, CLICK IT TO SEE MORE
if len(driver.find_elements_by_xpath("//button[#class='UIButton UIButton--fill' and #aria-label='See more']")) > 0:
see_more = driver.find_element_by_xpath("//button[#class='UIButton UIButton--fill' and #aria-label='See more']")
see_more.click()
del see_more
# CHECK IF THERE ARE TERMS
quizlet_terms_classes = driver.find_elements_by_class_name("SetPageTerm-wordText")
quizlet_definitions_classes = driver.find_elements_by_class_name("SetPageTerm-definitionText")
if (len(quizlet_terms_classes) > 0) and (len(quizlet_definitions_classes) > 0):
# append current iteration terms and definitions to full quizlet terms and definitions
quizlet = numpy.vstack((quizlet, numpy.transpose([list(map(lambda term: remove_whitespace(term.text), quizlet_terms_classes)), list(map(lambda definition: remove_whitespace(definition.text), quizlet_definitions_classes))])))
# get unique rows
quizlet = numpy.unique(quizlet, axis = 0)
del quizlet_terms_classes, quizlet_definitions_classes
driver.execute_script(f"window.scrollBy(0, {chunk_size})")
del terms_in_this_set
I have a page, and I have 3 radio buttons on it. I want my code to consecutively click each of these buttons, and as they are clicked, a value (mpn) is displayed, I want to obtain this value. I am able to write the code for a single radio button, but I dont understand how i can create a loop so that only value of this button changes (value={1,2,3})
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome(executable_path=r"C:\Users\Home\Desktop\chromedriver.exe")
driver.get("https://www.1800cpap.com/resmed-airfit-n30-nasal-cpap-mask-with-headgear")
soup = BeautifulSoup(driver.page_source, 'html.parser')
size=driver.find_element_by_xpath("//input[#class='product-views-option-tile-input-picker'and #value='2' ]")
size.click()
mpn= driver.find_element_by_xpath("//span[#class='mpn-value']")
print(mpn.text)
Also, for each page, the buttons vary in number, and their names. So, if there is any general solution that i could extend to all pages, for all buttons, it would be highly appreciated. Thanks!
Welcome to SO!
You were a small step from the correct solution! In particular, the find_element_by_xpath() function returns a single element, but the similar function find_elements_by_xpath() (mind the plural) returns an iterable list, which you can use to implement a for loop.
Below a MWE with the example page that you provided
from selenium import webdriver
import time
driver = webdriver.Firefox() # initiate the driver
driver.get("https://www.1800cpap.com/resmed-airfit-n30-nasal-cpap-mask-with-headgear")
time.sleep(2) # sleep for a couple seconds to ensure correct upload
mpn = [] # initiate an empty results' list
for button in driver.find_elements_by_xpath("//label[#data-label='label-custcol3']"):
button.click()
mpn.append(driver.find_element_by_xpath("//span[#class='mpn-value']").text)
print(mpn) # print results
I want to scrape data from an HTML table for different combinations of drop down values via looping over those combinations. After a combination is chosen, the changes need to be submitted. This is, however, causing an error since it refreshes the page.
This it what I've done so far:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
browser.get('https://daten.ktbl.de/feldarbeit/entry.html')
# Selecting the constant values of some of the drop downs:
fertilizer = Select(browser.find_element_by_name("hgId"))
fertilizer.select_by_value("2")
fertilizer = Select(browser.find_element_by_name("gId"))
fertilizer.select_by_value("193")
fertilizer = Select(browser.find_element_by_name("avId"))
fertilizer.select_by_value("383")
fertilizer = Select(browser.find_element_by_name("hofID"))
fertilizer.select_by_value("2")
# Looping over different combinations of plot size and amount of fertilizer:
size = Select(browser.find_element_by_name("flaecheID"))
for size_values in size.options:
size.select_by_value(size_values.get_attribute("value"))
time.sleep(1)
amount= Select(browser.find_element_by_name("mengeID"))
for amount_values in amount.options:
amount.select_by_value(amount_values.get_attribute("value"))
time.sleep(1)
#Refreshing the page after the two variable values are chosen:
button = browser.find_element_by_xpath("//*[#type='submit']")
button.click()
time.sleep(5)
This leads to the error:selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <option> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed.
Obviously the issue is that I did indeed refresh the document.
After submitting the changes and the page has loaded the results, I want to retrieve the them with:
html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")
(Shout-out to #bink1time who answered this part of my question here).
How can I update the page without breaking the loop?
I would very much appreciate some help here!
Stale Element Reference Exception often occurs upon page refresh because of an element UUID change in the DOM.
In order to avoid it, always try to search for an element before an interaction. In your particular case, you searched for size and amount, found them and stored them in variables. But then, upon refresh, their UUID changed, so old ones that you have stored are no longer attached to the DOM. When trying to interact with them, Selenium cannot find them in the DOM and throws this exception.
I modified your code to always re-search size and amount elements before the interaction:
# Looping over different combinations of plot size and amount of fertilizer:
size = Select(browser.find_element_by_name("flaecheID"))
for i in range(len(size.options)):
# Search and save new select element
size = Select(browser.find_element_by_name("flaecheID"))
size.select_by_value(size.options[i].get_attribute("value"))
time.sleep(1)
amount = Select(browser.find_element_by_name("mengeID"))
for j in range(len(amount.options)):
# Search and save new select element
amount = Select(browser.find_element_by_name("mengeID"))
amount.select_by_value(amount.options[j].get_attribute("value"))
time.sleep(1)
#Refreshing the page after the two variable values are chosen:
button = browser.find_element_by_xpath("//*[#type='submit']")
button.click()
time.sleep(5)
Try this? It worked for me. I hope it helps.
I'm doing a scraping with Selenium in Python. My problem is that after I found all the WebElements, I'm unable to get their info (id, text, etc) if the element is not really VISIBLE in the browser opened with Selenium.
What I mean is:
First image
Second image
As you can see from the first and second images, I have the first 4 "tables" that are "visible" for me and for the code. There are however, other 2 tables (5 & 6 Gettho lucky dip & Sue Specs) that are not "visible" until I drag down the right bar.
Here's what I get when I try to get the element info, without "seeing it" in the page:
Third image
Manually dragging the page to the bottom and therefore making it "visible" to the human eye (and also to the code ???) is the only way I can the data from the WebDriver element I need:
Fourth image
What am I missing ? Why Selenium can't do it in background ? Is there a manner to solve this problem without going up and down the page ?
PS: the page could be any kind of dog race page in http://greyhoundbet.racingpost.com/. Just click City - Time - and then FORM.
Here's part of my code:
# I call this function with the URL and it returns the driver object
def open_main_page(url):
chrome_path = r"c:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(url)
# Wait for page to load
loading(driver, "//*[#id='showLandingLADB']/h4/p", 0)
element = driver.find_element_by_xpath("//*[#id='showLandingLADB']/h4/p")
element.click()
# Wait for second element to load, after click
loading(driver, "//*[#id='landingLADBStart']", 0)
element = driver.find_element_by_xpath("//*[#id='landingLADBStart']")
element.click()
# Wait for main page to load.
loading(driver, "//*[#id='whRadio']", 0)
return driver
Now I have the browser "driver" which I can use to find the elements I want
url = "http://greyhoundbet.racingpost.com/#card/race_id=1640848&r_date=2018-
09-21&tab=form"
browser = open_main_page(url)
# Find dog names
names = []
text: str
tags = browser.find_elements_by_xpath("//strong")
Now "TAGS" is a list of WebDriver elements as in the figures.
I'm pretty new to this area.
UPDATE:
I've solved the problem with a code workaround.
tags = driver.find_elements_by_tag_name("strong")
for tag in tags:
driver.execute_script("arguments[0].scrollIntoView();", tag)
print(tag.text)
In this manner the browser will move to the element position and it will be able to get its information.
However I still have no idea why with this page in particular I'm not able to read webpages elements that are not visible in the Browser area untill I scroll and literally see them.