Loop a List of Links - Selenium Python - python

I've got the following use case.
I want to Loop through different games on this website:
https://sports.bwin.de/en/sports/football-4/betting/germany-17
Each game has got a detailed page to be found by this element:
grid-event-wrapper
By looping these elements, I would have to click on each one of them, scrape the data from the detailed page and get back
Something like this:
events = driver.find_elements_by_class_name('grid-event-wrapper')
for event in events:
event.click()
time.sleep(5)
# =============================================================================
# Logic for scraping detailed information
# =============================================================================
driver.back()
time.sleep(5)
The first iteration is working fine, but by the second one I throws the following exception:
StaleElementReferenceException: stale element reference: element is not attached to the page document
(Session info: chrome=90.0.4430.93)
I tried different things like re-initializing my events, but nothing worked.
I am sure, that there is a oppurtinity to hold the state even if I have to go back in the browser.
Thanks for your help in advance

Instead of for event in events: loop try the following:
size = len(driver.find_elements_by_class_name('grid-event-wrapper'))
for i in range(1,size+1):
xpath = (//div[#class='grid-event-wrapper'])[i]
driver.find_elements_by_xpath(xpath).click
now you do here what you want and finally get back

Clicking on the element reloads the page, thereby losing the old references.
There are two things you can do.
One is keep a global set where you store the "ID" of the game, (you can use the URL of the game (e.g. https://sports.bwin.de/en/sports/events/fsv-mainz-05-hertha-bsc-11502399 as ID or any other distinguishing characteristic).
Alternatively, you can first extract all the links. (These are first children of your grid-event-wrapper, so you can do event.find_element_by_tagname('a') and access href attribute of those. Once all links are extracted, you can load them one by one.
events = driver.find_elements_by_class_name('grid-event-wrapper')
links = []
for event in events:
link = event.find_element_by_tag_name('a').get_attribute('href')
links.append(link)
for link in links:
# Load the link
# Extraction logic
I feel the second way is a bit cleaner.

Related

Selenium - stale element reference: element is not attached to the page document

I navigate to a page and then find the list of episodes. I get each episode and click on the link for the episode. But when I go back to the page that has the list of episodes the following error happens:
stale element reference: element is not attached to the page document
My code is:
navigator.get('https://anchor.fm/dashboard/episodes')
time.sleep(5)
#get list
list_episodes = navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[2]/ul')
#get episodes in list
items = list_episodes.find_elements_by_tag_name('li')
for item in items:
item.find_element_by_tag_name('button').click()
time.sleep(10)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[2]/div/div[2]/div/div/div/button').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[2]/div/div[2]/div/div/div/div/div/div[1]/button[6]').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[3]/div[1]/div/div/div/button').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[3]/div[1]/div/div/div/div/div/div/button[1]').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[3]/div[3]/div/div/a').click()
time.sleep(3)
navigator.get('https://anchor.fm/dashboard/episodes')
time.sleep(5)
By navigating to another page all collected by selenium web elements (they are actually references to a physical web elements) become no more valid since the web page is re-built when you open it again.
To make your code working you need to collect the items list again each time.
This should work:
navigator.get('https://anchor.fm/dashboard/episodes')
time.sleep(5)
#get list
list_episodes = navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[2]/ul')
#get episodes in list
items = list_episodes.find_elements_by_tag_name('li')
for i in range(len(items)):
item = items[i]
item.find_element_by_tag_name('button').click()
time.sleep(10)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[2]/div/div[2]/div/div/div/button').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[2]/div/div[2]/div/div/div/div/div/div[1]/button[6]').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[3]/div[1]/div/div/div/button').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[3]/div[1]/div/div/div/div/div/div/button[1]').click()
time.sleep(2)
navigator.find_element_by_xpath('//*[#id="app-content"]/div/div/div/div[3]/div[3]/div/div/a').click()
time.sleep(3)
navigator.get('https://anchor.fm/dashboard/episodes')
time.sleep(5)
#get the `items` list again
items = list_episodes.find_elements_by_tag_name('li')
In this case, it's helpful to think of the pages as instances of a class. They may have the same name, the same properties, the same values but they're still separate objects and you can't call object A if you have a reference to object B.
Here's what's happening to you in this case; I highlighted the interesting parts.
You navigate to a directory page
2. Server builds an instance of the page & displays it to you
You get a hold of episode objects on the page
You navigate to one of the episodes
5. Server destroys the directory page. Any objects in it you were
holding disappear with it
6. Server builds a copy of the episode page & displays it to you
You navigate back to the directory page
8. Server builds a new instance of the page & displays it to you
You try to click an element from the old instance of the page that no longer exists
You get a stale reference exception because, well - your reference is now stale
The way to fix this is to find the episode elements each time you navigate to the directory page. If you find them once and store them, they'll go bad as soon as you navigate elsewhere and their parent page poofs.
Also, a note about your Xpaths: I'd encourage you to stop using your browser's 'Copy Xpath' function, it doesn't often get good results. There are plenty of tutorials on how to write good Xpaths online that are worth reading.

I'm not sure why this xpath is unable to be located using Python Selenium

I'm trying to scrape every item on a site that's displayed in a grid format with infinite scrolling. However, I'm stuck on even getting the second item using xpath because it's saying:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='el-card advertisement card is-always-shadow'][2]"}
x = 1
while x < 5:
time.sleep(5)
target = driver.find_element_by_xpath(f"//div[#class='el-card advertisement card is-always-shadow'][{x}]")
target.click()
wait.until(EC.visibility_of_element_located((By.ID, "details")))
print(driver.current_url)
time.sleep(5)
driver.back()
time.sleep(5)
WebDriverWait(driver, 3).until(EC.title_contains("Just Sold"))
time.sleep(5)
x += 1
With my f-string xpath it's able to find the first div with that class and print the URL, but the moment it completes one iteration of the while loop, it fails to find the 2nd div with that class (so 2).
I've tried monitoring it with all the time.sleep() to see exactly where it was failing because I thought maybe it was running before the page loaded and therefore it couldn't be found, but I gave it ample time to finish loading every page and yet I can't find the issue.
This is the structure of the HTML on that page:
There is a class of that name (as well as "el-card__body" which I have also tried using) within each div, one for each item being displayed.
(This is what each div looks like)
Thank you for the help in advance!
(disclaimer: this is for a research paper, I do not plan on selling/benefiting off of the information being collected)
Storing each item in a list using find_elements_by_xpath, then iterating through them did the trick, as suggested by tdelaney.

Python nested for loop selenium

I'm creating two for loops (one nested in the other).
My code looks like this:
try:
a = browser.find_elements_by_class_name("node")
for links in a:
links.click()
for id in range(2, 41):
my_id = "stree{}".format(id)
browser.find_element_by_id(my_id).click()
browser.find_element_by_xpath('/html/body/center[2]/form/table[1]/tbody/tr/td[3]/table/tbody/tr[5]/td[1]/a[1]/img').click()
browser.find_element_by_xpath('/html/body/center[2]/form/table[2]/tbody/tr/td[4]/input').click()
browser.find_element_by_xpath('/html/body/center/form/table[2]/tbody/tr/td[5]/a').click()
sleep(5)
# browser.execute_script("window.history.go(-1)")
except:
a = browser.find_elements_by_class_name("node")
for links in a:
links.click()
for id in range(2, 41):
my_id = "stree{}".format(id)
browser.find_element_by_id(my_id).click()
browser.find_element_by_xpath('/html/body/center[2]/form/table[1]/tbody/tr/td[3]/table/tbody/tr[5]/td[1]/a[1]/img').click()
browser.find_element_by_xpath('/html/body/center[2]/form/table[2]/tbody/tr/td[4]/input').click()
browser.find_element_by_xpath('/html/body/center/form/table[2]/tbody/tr/td[5]/a').click()
sleep(5)
# browser.execute_script("window.history.go(-1)")
What the code is doing:
It is going through two for loops and then going to a new page where it clicks on something. Then, I want the browser to go back to go through the for loops. The problem is that since the outer loop has to be executed first before the second loop could be executed, I face some issue while going back.
Two important questions:
1. Do I need to tell my browser to go back?
2. How can I execute the outercode first and then the code within?
The page looks like this:
enter image description here
The html for outer loop looks like this:
enter image description here
The html for inner loop (by clicking on this, I will go to the next page):
enter image description here
How do I improve my code? Just to clarify: I want to go through all the files.
Edit: Someone asked for more clarification. In the photo of the page (attached), do you see folder icons? I want to click on them, that opens up all the file icons. I'm choosing those files by clicking on them, then clicking on the arrow in the page to put it into some box, and then clicking on "Accepting my selection" which takes me to the next page where I click on Excel, and that downloads my file. The "for-loop" is my attempt to go through all the files in those folders. Obviously, I have given a large explanation, but the point remains about the for-loop. The "class name - node" refers to folder icons and "for-id" refers to the file icons.
At the end of the outer for loop, you could add a function that goes
back to the starting page in order to click the next link
OR
Instead of clicking the links, you could collect them and then use
the outer loop to connect to these links. In other words, collect
all links of the starting page with find_all and the make your
browser connect to each one with the outer for loop. More specifically:
First, you create a browser instance (ABrowser could be Firefox() or anything else) and connect to the starting webpage as you already do:
browser = webdriver.ABrowser()
connection=browser.get(StartingPageURL)
Then you collect all links with the desired characteristics:
a = browser.find_elements_by_class_name("node")
And now you have a, which is a list of the links URLs. Instead of clicking a link, do the job and go back to the starting page, you can make your browser connect to the link URL, do the job and then connect to the next link URL etc. with a for loop
for links in a:
connection=browser.get(link) ## browser connect to the link URL
for id in range(2, 41):
my_id = "stree{}".format(id)
browser.find_element_by_id(my_id).click()
browser.find_element_by_xpath('/html/body/center[2]/form/table[1]/tbody/tr/td[3]/table/tbody/tr[5]/td[1]/a[1]/img').click()
browser.find_element_by_xpath('/html/body/center[2]/form/table[2]/tbody/tr/td[4]/input').click()
browser.find_element_by_xpath('/html/body/center/form/table[2]/tbody/tr/td[5]/a').click()
sleep(5)
Usually I prefer the second option

Python-Selenium Fetching data from previous listing webpage using selenium in python

I have a scenario I am trying to fetch some data from detail page using selenium and python. I am new to selenium and Python
I tried to use
self.driver.execute_script("window.history.go(-1)")
to go back to previous page and start fetching 2,3, 4 record etc but issue is:
After fetching 1 record, Click event occur it moves to details page and fetch remaining data But, when it move backward to the listing page from detail page it throws error
On cmpname = selectAll.find_element_by_css_selector(".Capsuletitle h2")
It throws error StaleElementRefrenceException: Element is not attached to the page document
Basically, What I wanted I have listing page and detail page for each record I want to fetch data from both pages
Here is my loop code part
parentTab = self.driver.find_element_by_class_name("capsuleList")
for selectAll in parentTab.find_elements_by_class_name("bsCapsule"):
cmpname = selectAll.find_element_by_css_selector(".Capsuletitle h2")
print(cmpname.text)
address = selectAll.find_element_by_css_selector(".Capsuleaddress a span")
print(address.text)
telephone = selectAll.find_element_by_css_selector(".Capsuletel")
print(telephone.text)
selectAll.find_element_by_css_selector('.Capsuletitle div a').click()
time.sleep(20)
adrurl = self.driver.find_element_by_css_selector('.CapsulecallToAction a').get_attribute('href')
print(adrurl)
self.driver.execute_script("window.history.go(-1)")
time.sleep(20)
Regards
First, instead of using self.driver.execute_script("window.history.go(-1)"), you can just use driver.back(). As for the error your getting, the most frequent cause of this is that page that the element was part of has been refreshed, or the user has navigated away to another page. So I suggest you try looking up the element again after you go back a page. I hope this helps!

Selenium + Python: StaleElementReferenceException (selecting by class)

I'm trying to write a simple Python script using Selenium, and while the loop runs once, I'm getting a StaleElementReferenceException.
Here's the script I'm running:
from selenium import webdriver
browser = webdriver.Firefox()
type(browser)
browser.get('http://digital2.library.ucla.edu/Search.do?keyWord=&selectedProjects=27&pager.offset=50&viewType=1&maxPageItems=1000')
links = browser.find_elements_by_class_name('searchTitle')
for link in links:
link.click()
print("clicked!")
browser.back()
I did try adding browser.refresh() to the loop, but it didn't seem to help.
I'm new to this, so please ... don't throw stuff at me, I guess.
It does not make sense to click through links inside of a loop. Once you click the first link, then you are no longer on the page where you got the links. Does that make sense?
Take the for loop out, and add something like link = links[0] to grab the first link, or something more specific to grab the specific link you want to click on.
If the intention is to click on every link on the page, then you can try something like the following:
links = browser.find_elements_by_class_name('searchTitle')
for i in range(len(links)):
links = browser.find_elements_by_class_name('searchTitle')
link = links[i] # specify the i'th link on the page
link.click()
print("clicked!")
browser.back()
EDIT: This might also be as simple as adding a pause after the browser.back(). You can do that with the following:
from time import sleep
...
sleep(5) # specify pause in seconds

Categories