I am writing a webscraper that uses data from a already existing spreadsheet to pull data from a website. It uses codes (that reference products) from a certain column to search the site. However, when searching for one product, multiple are displayed with only one being a correct match. I have created a system that can search for the correct code and select the product via find_element_by_xpath, but it does not account for multiple pages. My goal is to (upon the code not being found) move to the next page and search for the same code without moving to the next excel row, stopping when the final page is reached. I have already found a snippet of code that should work on moving to the next page:
try:
_driver.find_element_by_class_name("next").click()
print("Navigating to Next Page")
except TimeoutException as e:
print("Final Page")
break
However, I am unsure where/how I would implement this without either breaking the code, or moving down by a row.
Here is a snippet of how my code works so far (obviously simplified)
for i in data.index: #(_data is spreadsheet column)
try:
# locate product code
# copy product link
# navigate to link
try:
# wait for site to load
# Copy data to Spreadsheet
except TimeoutException:
# Skip if site takes too long
except Exception as e:
# Catch any possible exceptions and continues loop (normally when product cannot be found)
Any help would be much appreciated, whether it be how to implement the code snippet above, or a better way to go about moving from page to page. IF needed I can supply a link to the website or snippets of my code in further detail :)
A Python program terminates as soon as it encounters an error. In Python, an error can be a syntax error or an exception. The try-except code lets you test code and catch the exception that might occur without terminating the program.
To your question, you might want to use recursion functions in order to travel through the pages.
You could try something like this :
def rec(site, product):
if(final-page)
return exception_not_found
try:
# locate product code
try:
# wait for site to load
# Copy data to Spreadsheet
if(found_product)
return #found, break
except TimeoutException:
return # Skip if site takes too long
except Exception as e:
return # Skip if fails ?
if(we_did_not_find_product)
# copy product link
# navigate to link
#navigate to next site
rec(next_site, product)
for i in data.index: #(_data is spreadsheet column)
rec(init_side, i)
Meaning for each row in the spreadsheet, we go the the initial page, look for the product, if we did not find it, moving to next page until either we found the product or we reached the last page. Going to next row in cases : if exception occures, found the product, reached next page.
How I went about it (Storing the page mover and code checker as functions and using them to call each other):
def page_mover():
try:
# Click Next page
page_link()
except Exception:
print("Last page reached")
def page_link():
try:
# Wait for page to load
# Get link using product code
# Go to link
except Exception:
page_mover()
Related
I'm trying to find a specific element on a page.
The page will automatically change the page content after the page is loaded and after certain validations.
So, after the page is loaded, we try to determine whether the page has been loaded or not based on whether there is a specific element in the changed page content.
but it doesn't work
I want to do the following code is executed If the element is found within the specified time.
If I can't find it by that time, Close Selenium Object and the script.
#Waiting for detect Element
try:
UserListElement = WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.ID, "mainUserSearchDiv"))
)
except TimeoutException: #Can't find element during time.
print("Cannot find element")
browser.close #Close selenium.
finally: #Can find Element
print("Complite Load Page")
Looks like your problem is with browser.close.
Instead of close it should be close().
So please change from
browser.close
to
browser.close()
And your code should be correct
I've got the following use case.
I want to Loop through different games on this website:
https://sports.bwin.de/en/sports/football-4/betting/germany-17
Each game has got a detailed page to be found by this element:
grid-event-wrapper
By looping these elements, I would have to click on each one of them, scrape the data from the detailed page and get back
Something like this:
events = driver.find_elements_by_class_name('grid-event-wrapper')
for event in events:
event.click()
time.sleep(5)
# =============================================================================
# Logic for scraping detailed information
# =============================================================================
driver.back()
time.sleep(5)
The first iteration is working fine, but by the second one I throws the following exception:
StaleElementReferenceException: stale element reference: element is not attached to the page document
(Session info: chrome=90.0.4430.93)
I tried different things like re-initializing my events, but nothing worked.
I am sure, that there is a oppurtinity to hold the state even if I have to go back in the browser.
Thanks for your help in advance
Instead of for event in events: loop try the following:
size = len(driver.find_elements_by_class_name('grid-event-wrapper'))
for i in range(1,size+1):
xpath = (//div[#class='grid-event-wrapper'])[i]
driver.find_elements_by_xpath(xpath).click
now you do here what you want and finally get back
Clicking on the element reloads the page, thereby losing the old references.
There are two things you can do.
One is keep a global set where you store the "ID" of the game, (you can use the URL of the game (e.g. https://sports.bwin.de/en/sports/events/fsv-mainz-05-hertha-bsc-11502399 as ID or any other distinguishing characteristic).
Alternatively, you can first extract all the links. (These are first children of your grid-event-wrapper, so you can do event.find_element_by_tagname('a') and access href attribute of those. Once all links are extracted, you can load them one by one.
events = driver.find_elements_by_class_name('grid-event-wrapper')
links = []
for event in events:
link = event.find_element_by_tag_name('a').get_attribute('href')
links.append(link)
for link in links:
# Load the link
# Extraction logic
I feel the second way is a bit cleaner.
So I have a selenium script that will automatically enter a series of numbers into a website, and the website will redirect the user to another website based off if the numbers match a PIN. However, the browser takes a short time to redirect the user, in which the next line of code would have already run and returned an error.
I was thinking something like this would work but it doesn't, I don't know why.
def checkElement():
try:
xpath = '//*[#id="name"]'
print("Page is ready!")
except TimeoutException:
print("failed")
checkElement()
I believe that you are looking for WebDriverWait. You can add specific condition in it. Please find the sample code below.
first_result = wait.until(presence_of_element_located((By.XPATH, "//*[#id='name']")))
I am trying to validate that a value changes to the correct text and if it does not to refresh the page and check again for up to set time.
I have tried while-loops, if statements and nested variations of both with no success. I am not even sure how to format it as this point.
element = driver.find_element_by_xpath('xpath')
While True:
if element contains textA
break
else if element contains textB
driver.refresh()
else
error
Something along those lines. Ignore any syntax errors, I am just trying to get the idea across
I have also tried using EC and By with no luck
Edit: Adding some details
So what I have is a table. I am inserting a new row with no problems. Then I need to check that one of the column values of the new row gets updated from 'new' to 'old' which usually takes about anywhere from 30secs to 2mins. This is all viewable from a web ui. I need to refresh the page in order to see the value change. I wish I had some more detailed code or error to post along with it but honestly I am just beginning to learn Selenium
Can you please try the following :
while True:
try:
driver.find_element_by_xpath('xpath'):
except NoSuchElementException:
driver.refresh
else:
print("Text found")
break
Note: I suggest to create text-based XPath to avoid an extra line of code to get and compare text.
I am scraping a website for data from a table, which is loaded via AJAX. The website is slow, and inconsistent, so sometimes I have to wait <5 sec for the table to load, while other times I have to wait 25 - 30. I am iterating through hundreds of items that filter the table, and once loaded, I go to the next item.
The functionality of the Explicit Wait / Expected Conditions does not seem to be behaving as I expect and wondered if anyone might have some insight.
I have tried numerous approaches to the problem, which I seem to have a different exception each time I run it.
This first snippet is to keep trying until it finds the element. I want to continue running until the page is fully loaded and the element is found. The problem is, the page is still loading and the element hasn't been found yet, but it still throws an exception.
for s in range(0,1000):
try:
#Other Month Value Clicked
wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[9]/div/div[2]/div[" + str(mths[x]) + "]")))
except NoSuchElementException:
print(".", end=".", flush=True)
time.sleep(1)
timePeriodVal.click()
time.sleep(1)
timePeriodVal.click()
continue
finally:
timePeriod = (driver.find_element_by_xpath("/html/body/div[9]/div/div[2]/div[" + str(mths[x]) + "]"))
timePeriod.click()
#print('\nTime Period clicked')
time.sleep(1.5)
break