I am trying to print the XPath data in the console. But the process stops after loading the first page even without no errors.
This is my code:
browser.get('https://stackoverflow.com/questions?pagesize=10')
while True:
try:
elm = browser.find_element_by_link_text("next")
elm.click()
labels = browser.find_element_by_xpath('/html/body/div[3]/div[2]/div[1]/div[3]/div/div/div[1]/div[2]/h3/a')
for label in labels:
print label.text
except:
break
What am I missing?
The reason why are not getting any errors is because you are catching them and then just using break.
You also had an issue with your XPATH to the question labels. I included a scroll to the next link in case you are receiving the cookies notification at the bottom like I was. Here is a working example:
NOTE: This was testing in Python 3 using current Chrome build 67 and
chromedriver 2.40
import traceback
from selenium.common.exceptions import NoSuchElementException
browser.get('https://stackoverflow.com/questions?pagesize=10')
while True:
try:
elm = browser.find_element_by_link_text("next")
browser.execute_script("return arguments[0].scrollIntoView();", elm)
elm.click()
labels = browser.find_elements_by_xpath('.//a[#class="question-hyperlink"]')
for label in labels:
print(label.text)
except NoSuchElementException:
print("only catching this exception now when you run out of the next elements, other exceptions will raise")
print(traceback.format_exc())
break
Related
My task is to open each url from the following website and retrieve some evaluation data for each essay. I have located the element successfully, which means I get 10 element. However, when selenium began to imitate human to click the url, it can only open the first link of ten links.
https://esi.clarivate.com/DocumentsAction.action
HTML:
The code is as followed.
import time
from selenium import webdriver
driver=webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get('https://esi.clarivate.com/IndicatorsAction.action?Init=Yes&SrcApp=IC2LS&SID=H3-M1jrs4mSS2O3WTFbtdrUJugtDvogGRIM-18x2dx2B1ubex2Bo9Y5F6ZPQtUZbfUAx3Dx3Dp1StTsneXx2B7vu85UqXoaoQx3Dx3D-03Ff2gF3hTJGBPDScD1wSwx3Dx3D-cLUx2FoETAVeN3rTSMreq46gx3Dx3D')
#add filter-> research fields-> "clinical medicine"
target = driver.find_element_by_id("ext-gen1065")
time.sleep(1)
target.click()
time.sleep(1)
n = driver.window_handles
driver.switch_to.window(n[-1])
links=driver.find_elements_by_class_name("docTitle")
length=len(links)
for i in range(0,length):
item=links[i]
item.click()
time.sleep(1)
handles=driver.window_handles
index_handle=driver.current_window_handle
for handle in handles:
if handle != index_handle:
driver.switch_to.window(handle)
else:
continue
time.sleep(1)
u1=driver.find_elements_by_class_name("large-number")[2].text
u2=driver.find_elements_by_class_name("large-number")[3].text
print(u1,u2)
print("\n")
driver.close()
time.sleep(1)
driver.switch_to_window(index_handle)
driver.quit()
print("————finished————")
The error page:
And I try to find out the problem by testing these code:
links=driver.find_elements_by_class_name("docTitle")
length=len(links)
print(length)
print(links[1].text)
#links[0].click()
links[1].click()
The result is:
which means it had already find the element, but failed to open it.(when using links[0].text, it works fine.)
Any idea about this?
I'm a complete newbie at dealing with python and selenium, just started a week ago, so do excuse my mess of a code. I'm trying to extract all the 'structure_id' and 'd' information from elements with tag name in this website and store each of them in a separate svg file. Here is a snippet of the code I'm having problems with:
for number in range(1,106):
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.ID, 'master_group'))
)
selected = driver.find_element_by_class_name('simstripImgSel')
driver.get(driver.current_url)
paths = driver.find_elements_by_tag_name('path')
for path in paths:
while True:
try:
structure = path.get_attribute('structure_id')
d = path.get_attribute('d')
break
except Exception as e:
print(e)
paths = driver.find_elements_by_tag_name('path')
continue
if structure != None:
print('Attributes copied.')
for word, initial in data.items():
structure = structure.replace(word,initial)
filepath = Path('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg')
if filepath.is_file():
text = open('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg','r+')
rep = text.read()
rep = rep.replace('</svg>','<path id="')
text.close()
os.remove('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg')
time.sleep(0.2)
text = open('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg','w+')
text.write(rep+str(structure)+'" d="'+str(d)+'"></path></svg>')
text.close()
print('File '+str(structure)+' modified in slice '+str(number)+'!')
else:
svg = open('C:\\Users\\justs\\Downloads\\Ordered_svgs\\slice'+str(number)+'\\'+str(structure)+'.svg','w+')
svg.write('<svg id="the_svg_wrapper" width="100%" height="100%" xmlns="http://www.w3.org/2000/svg"><path id="'+str(structure)+'" d="'+str(d)+'"></path></svg>')
svg.close()
print('File '+str(structure)+' made in slice '+str(number)+'!')
selected.send_keys('F')
paths = 0
print()
except Exception as e:
print('Error.')
print(e)
break
print('Done!')
driver.quit()
This works fine for the first page, but I need to extract paths for all 106 pages, and after pressing 'F' once (which moves on to the next page) I get a stale element reference at the line structure = path.get_attribute('structure_id'). Initially I thought the paths took some time to load, hence the while loop, but by the second page it gets stuck with never-ending stale element references.
Explicit waits or refreshing the page didn't work too, I suspect the driver.find_element_by_class_name WebElement isn't updating at all (when I refreshed the page after moving on to the next page, the files I extracted ended up being the same as the first page, and I got a stale element reference by page 5 anyways). How do I solve this? Any help is appreciated!
You looped the url so it went to page 1.
driver.get('http://atlas.brain-map.org/atlas?atlas=265297126#atlas=265297126&plate=102339919&structure=10155&x=42480&y=16378&zoom=-7&resolution=124.49&z=2')
for i in range(1,106):
try:
paths=WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.TAG_NAME, "path")))
for path in paths:
structure = path.get_attribute('structure_id')
d = path.get_attribute('d')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "simstripImgSel"))).send_keys("F")
time.sleep(0.5)
except Exception as e:
print(e)
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
I am trying to download excel files from a website using selenium in headless mode. While it's working perfectly fine in most cases, there are a few cases(some months of an year) where the driver.find_element_by_xpath() fails to work like expected. I have been through many posts and though that the element might not have appeared when the driver was looking for it, but that isn't case as I thoroughly checked it and also tried to slow down the process using time.sleep(), on a side note I also use driver.implicitly_wait() to make things easier as the website actually takes a while to load content on the page. I couldn't use requests because it doesn't show any data in the response of get request. My script is as follows:
from selenium import webdriver
import datetime
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
import os
import shutil
import time
import calendar
currentdir = os.path.dirname(__file__)
Initial_path = 'whateveritis'
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_experimental_option("prefs", {
"download.default_directory": f"{Initial_path}",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
def save_hist_data(year, months):
def waitUntilDownloadCompleted(maxTime=1200):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time() + maxTime
while True:
try:
# get the download percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# exit the method once it's completed
return downloadPercentage
except:
pass
# wait for 1 second before checking the percentage next time
time.sleep(1)
# exit method if the download not completed with in MaxTime.
if time.time() > endTime:
break
starts_on = 1
for month in months:
no_month = datetime.datetime.strptime(month, "%b").month
no_of_days = calendar.monthrange(year, no_month)[1]
print(f"{no_of_days} days in {month}-{year}")
driver = webdriver.Chrome(executable_path="whereeveritexists", options=chrome_options)
driver.maximize_window() #For maximizing window
driver.implicitly_wait(20)
driver.get("https://www.iexindia.com/marketdata/areaprice.aspx")
select = Select(driver.find_element_by_name('ctl00$InnerContent$ddlPeriod'))
select.select_by_visible_text('-Select Range-')
driver.find_element_by_xpath("//input[#name='ctl00$InnerContent$calFromDate$txt_Date']").click()
select = Select(driver.find_element_by_xpath("//td[#class='scwHead']/select[#id='scwYears']"))
select.select_by_visible_text(str(year))
select = Select(driver.find_element_by_xpath("//td[#class='scwHead']/select[#id='scwMonths']"))
select.select_by_visible_text(month)
#PROBLEM IS WITH THIS BLOCK
test=None
while not test:
try:
driver.find_element_by_xpath(f"//td[#class='scwCells' and contains(text(),'{starts_on}')]").click()
test=True
except IndentationError:
print('Entered except block -IE')
driver.find_element_by_xpath(f"//td[#class='scwCellsWeekend' and contains(text(), '{starts_on}')]").click()
test=True
except:
print('Entered except block -IE-2')
driver.find_element_by_xpath(f"//td[#class='scwInputDate' and contains(text(), '{starts_on}')]").click()
test=True
driver.find_element_by_xpath("//input[#name='ctl00$InnerContent$calToDate$txt_Date']").click()
select = Select(driver.find_element_by_xpath("//td[#class='scwHead']/select[#id='scwYears']"))
select.select_by_visible_text(str(year))
select = Select(driver.find_element_by_xpath("//td[#class='scwHead']/select[#id='scwMonths']"))
select.select_by_visible_text(month)
#PROBLEM IS WITH THIS BLOCK
test=None
while not test:
try:
driver.find_element_by_xpath(f"//td[#class='scwCells' and contains(text(), '{no_of_days}')]").click()
# time.sleep(4)
test=True
except IndentationError:
print('Entered except block -IE')
driver.find_element_by_xpath(f"//td[#class='scwCellsWeekend' and contains(text(), '{no_of_days}')]").click()
# time.sleep(4)
test=True
except:
# time.sleep(2)
driver.find_element_by_xpath(f"//td[#class='scwInputDate' and contains(text(), '{no_of_days}')]").click()
test=True
driver.find_element_by_xpath("//input[#name='ctl00$InnerContent$btnUpdateReport']").click()
driver.find_element_by_xpath("//a[#title='Export drop down menu']").click()
print("Right before excel button click")
driver.find_element_by_xpath("//a[#title='Excel']").click()
waitUntilDownloadCompleted(180)
print("After the download potentially!")
filename = max([Initial_path + f for f in os.listdir(Initial_path)],key=os.path.getctime)
shutil.move(filename,os.path.join(Initial_path,f"{month}{year}.xlsx"))
driver.quit()
def main():
# years = list(range(2013,2015))
# months = ['Jan', 'Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
# for year in years:
# try:
save_hist_data(2018, ['Mar'])
# except:
# pass
if __name__== '__main__':
main()
The while loops are basically being used to select the date element on the calendar(month and year are already being selected from the drop downs). Because the website has different tags if the date falls on weekday or weekend, I used try and except blocks to try all possible xpaths but the weird thing is, some months of an year simply don't work like expected. This is the link btw "https://www.iexindia.com/marketdata/areaprice.aspx". Especially, in the case of Mar-2018, searching for xpaths on the chrome browser manually works and it locates 31st of Mar-2018, but when the python script is being executed it throws and error saying
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//td[#class='scwInputDate' and contains(text(), '31')]"}
(Session info: headless chrome=84.0.4147.105)
Issue is with except : Exception handling. As per your code block if element was not found by "//td[#class='scwCells' and contains(text(), '{no_of_days}')]". Since for 31st March class is scwCellsWeekend element is not found.
As per first except it will handle an IdentationException. Since element not found is not an IdentationException, it is going for next except Exception handling.
Since for second except no condition is mentioned , NoSuchElementException is handled inside it. As per code given here it is trying to search and element with xpath //td[#class='scwInputDate' and contains(text(), '31')]. Which is again not able to find as a result you are getting NoSuchElementException.
Instead of using so many exception handling scenarios you can use logical operator or as bleow:
driver.find_element_by_xpath(f"//td[#class='scwCellsWeekend' and contains(text(), '{no_of_days}')] | //td[#class='scwCells' and contains(text(), '{no_of_days}')] | //td[#class='scwInputDate' and contains(text(), '{no_of_days}')]").click()
I'm attempting to use expected_conditions.element_to_be_clickable but it doesn't appear to be working. I'm still seeing "Element...is not clickable at point" errors in about 30% of the runs.
Here's the full error message:
selenium.common.exceptions.WebDriverException: Message: unknown
error: Element ... is not clickable at point (621, 337). Other
element would receive the click: ...
(Session info: chrome=60.0.3112.90)
(Driver info: chromedriver=2.26.436421 (6c1a3ab469ad86fd49c8d97ede4a6b96a49ca5f6),platform=Mac OS X 10.12.6
x86_64)
Here's the code I'm working with:
def wait_for_element_to_be_clickable(selector, timeout=10):
global driver
wd_wait = WebDriverWait(driver, timeout)
wd_wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, selector)),
'waiting for element to be clickable ' + selector)
print ('WAITING')
return driver.find_element_by_css_selector(selector)
Update:
So now this is really odd. Even if I add a couple of fixed wait periods, it still throws the error message occasionally. Here's the code where the call is being made:
sleep(5)
elem = utils.wait_for_element_to_be_clickable('button.ant-btn-primary')
sleep(5)
elem.click()
Ended up creating my own custom function to handle the exception and perform retries:
""" custom clickable wait function that relies on exceptions. """
def custom_wait_clickable_and_click(selector, attempts=5):
count = 0
while count < attempts:
try:
wait(1)
# This will throw an exception if it times out, which is what we want.
# We only want to start trying to click it once we've confirmed that
# selenium thinks it's visible and clickable.
elem = wait_for_element_to_be_clickable(selector)
elem.click()
return elem
except WebDriverException as e:
if ('is not clickable at point' in str(e)):
print('Retrying clicking on button.')
count = count + 1
else:
raise e
raise TimeoutException('custom_wait_clickable timed out')
The problem is stated in the error message.
Element ... is not clickable at point (621, 337). Other element would receive the click: ...
The problem is that some element, the details of which you removed from the error message, is in the way... on top of the element you are trying to click. In many cases, this is some dialog or some other UI element that is in the way. How to deal with this depends on the situation. If it's a dialog that is open, close it. If it's a dialog that you closed but the code is running fast, wait for some UI element of the dialog to be invisible (dialog is closed and no longer visible) then attempt the click. Generally it's just about reading the HTML of the element that is blocking, find it in the DOM, and figure out how to wait for it to disappear, etc.
I'm trying to take look at several pages on one web with Selenium - PhantomJS().
The problem is that it started freezing and I can't figure out why. It is probably something with Timeout.
Here is the__init__ method of a class.
self.driver = webdriver.PhantomJS(service_args=["--load-images=false"])
self.wait = WebDriverWait(self.driver, 2)
And here is the method:
def click_next_page(self):
log('click_next_page : '+self.driver.current_url) # THIS LINE RUNS
rep = 0
while 1:
try:
self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li.arr-rgt.active a'))) # IT MAY FREEZE HERE
self.driver.find_element_by_css_selector('li.arr-rgt.active a').click()# IT MAY FREEZE HERE
print 'NEXT' # DOESNT PRINT ANY TEXT SO THIS LINE NOT EXECUTED
log('NEXT PAGE')
return True
except Exception as e:
log('click next page EXCEPTION') # DONT HAVE THIS TEXT IN MY LOG SO IT DOES NOT RAISES ANY EXCEPTION
self.driver.save_screenshot('click_next_page_exception.png')
self.driver.back()
self.driver.forward()
rep += 1
log('REPEAT '+str(rep))
if rep>4:
break
sleep(4)
return False
The problem is that it does not raises any exception or any message.
The line log('click_next_page : '+self.driver.current_url) is working and then it freezes, I know it because I have click_next_page : http://.... in my log as a last line.
The problem is definitely somewhere here:
self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li.arr-rgt.active a')))
self.driver.find_element_by_css_selector('li.arr-rgt.active a').click()
But I can't realize where because it does not raise any Exception.
Could you give me an advice?
I don't have any idea about how Selenium works in PhantomJS. But, I am not seeing any issues within your code. To help you in knowing the exact problem, I would suggest you to debug it in smaller chunks and using one line at a time in console (not by running the python file).
So check with this :-
>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS(service_args=["--load-images=false"])
>>> wait = WebDriverWait(driver, 2)
>>> code for clicking next page
>>> time.sleep(5)
>>> driver.find_element_by_css_selector('li.arr-rgt.active a')
So, this should return you the selenium webdriver instance for the object you are searching using the css selector. If, the element itself is not found then it will throw error.
If the above code runs then re-run the above code with following modifications :-
>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS(service_args=["--load-images=false"])
>>> wait = WebDriverWait(driver, 2)
>>> code for clicking next page
>>> wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'li.arr-rgt.active a')))
>>> driver.find_element_by_css_selector('li.arr-rgt.active a').click()
Here you will be able to check whether there is actually problem with wait_until(). If there is any error, you can point it out by running it one by one. Hope this helps...