from selenium import webdriver
driver=webdriver.Firefox()
driver.get(url)
Sometimes the webdriver is stuck on a file or response and the page is never full-loaded so the line
driver.get(url)
is never finished. But I already got enough source code to run the rest of my code. I am wondering how can I bypass or refresh the page if the page is not full-loaded in 10 seconds.
I have tried
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
driver=webdriver.Firefox()
driver.set_page_load_timeout(10)
while True:
try:
driver.get(url)
except TimeoutException:
print("Timeout, retrying...")
continue
else:
break
but the line
driver.set_page_load_timeout(10)
always gives me following error
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 727, in set_page_load_timeout
'pageLoad': int(float(time_to_wait) * 1000)})
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 238, in execute
self.error_handler.check_response(response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 193, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message:
This is nothing after Message:. I can't identify the type of error. It's weird that my laptop can't run
driver.set_page_load_timeout(10)
My next step is to click a button on the page, but that button doesn't always exist even after full-loaded. Thus I can't use explicit wait.
Thanks
(In your code snippet you don't define URL, but I'll assume URL is defined somewhere in your actual code.)
You could combine the retry and timeout-decorator packages for this:
from retry import retry
from timeout_decorator import timeout, TimeoutError
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
#retry(TimeoutError, tries=3)
#timeout(10)
def get_with_retry(driver, url):
driver.get(url)
def main():
url = "http://something.foo"
driver=webdriver.Firefox()
try:
get_with_retry(driver, url)
foo(driver) # do whatever it is you need to do
finally:
driver.quit()
if __name__ == "__main__":
main()
Note that you would need to either not set driver.set_page_load_timeout to anything, or set it to something higher than 10 seconds.
Related
I am using selenium to running automated scripts in Python. I have used the tool for three years, but have never encountered this issue. Does anyone know what could be causing this? I was able to determine that the cause of the error was the reference to driver.get() inside a for loop, but it errors out after 7 iterations. Seems odds, thoughts?
Unhandled exception in thread started by <function crawl_games_and_store_data.<locals>.handle_incoming_request at 0x104659158>
Traceback (most recent call last):
File "/Users/z003bzf/Documents/personal/python/MLB/src/services/crawler.py", line 160, in handle_incoming_request
driver.get(game_link)
File "/Users/z003bzf/.local/share/virtualenvs/MLB-Ei2Ym8vD/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/Users/z003bzf/.local/share/virtualenvs/MLB-Ei2Ym8vD/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 318, in execute
params = self._wrap_value(params)
File "/Users/z003bzf/.local/share/virtualenvs/MLB-Ei2Ym8vD/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 271, in _wrap_value
converted[key] = self._wrap_value(val)
File "/Users/z003bzf/.local/share/virtualenvs/MLB-Ei2Ym8vD/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 276, in _wrap_value
return list(self._wrap_value(item) for item in value)
Here is the code that is causing the issue
for elem in link_lst:
driver.get(elem)
time.sleep(.5)
driver.find_element_by_xpath('//div[#class="box-row batting-row"]')
It could be the contents of link_lst, if there's a timeout on one of the hosts. You would have to handle this exception to continue forward. One possible option would be to use a try/except on a timeout as well as not being able to locate a page element. This can be adjusted both as a delay parameter, but also in the firefox profile settings.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
def fetch(driver, link_list, delay)
for item in link_list:
try:
driver.get(item)
except TimeoutException:
print("Timeout in link_list item")
try:
elem = WebDriverWait(driver,
delay).until(EC.presence_of_element_located((By.XPATH '//div[#class="box-row batting-row"]')))
except TimeoutException:
print("Locating element took too much time!")
if __name__ == '__main__':
""" Setup profile """
fp = webdriver.FirefoxProfile()
fp.set_preference("http.response.timeout", 10)
fp.set_preference("dom.max_script_run_time", 10)
""" Prepare args """
driver = webdriver.Firefox(firefox_profile=fp)
link_list = ['http://abcdeftest.com', 'http://test.com']
delay = 5
fetch(driver, link_list, delay)
Getting error: "invalid element state" when using Chrome driver for selenium.
What I'm trying to do:
Pass in some data to http://www.dhl.de/onlinefrankierung
My first issue is that when I try to use the .click() method on the checkbox named "Nachnahme" nothing happens, no check is made.
When you make a check manually, the page refreshes and additional fields open up, which is what I'm trying to access.
The second issue, which throws the invalid element state, happens when trying to pass in data using the .send_keys() method.
Here is my code so far:
from selenium import webdriver
driver = webdriver.Chrome('C:\\Users\\Owner\\AppData\\Local\\Programs\\Python\\Python36-32\\Lib\\site-packages\\chromedriver.exe')
driver.get('http://www.dhl.de/onlinefrankierung')
product_element = driver.find_element(by='id', value='bpc_PAK02')
product_element.click()
services_element = driver.find_element(by='id', value='sc_NNAHME')
services_element.click()
address_element_name = driver.find_element(by='name', value='formModel.sender.name')
address_element_name.send_keys("JackBlack")
ERROR:
C:\Users\Owner\AppData\Local\Programs\Python\Python36-32\python.exe
"C:/Users/Owner/Desktop/UpWork/Marvin Sessner/script.py" Traceback
(most recent call last): File "C:/Users/Owner/Desktop/UpWork/Marvin
Sessner/script.py", line 23, in
address_element_name.send_keys("tester") File "C:\Users\Owner\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webelement.py",
line 352, in send_keys
'value': keys_to_typing(value)}) File "C:\Users\Owner\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webelement.py",
line 501, in _execute
return self._parent.execute(command, params) File "C:\Users\Owner\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py",
line 308, in execute
self.error_handler.check_response(response) File "C:\Users\Owner\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\errorhandler.py",
line 194, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidElementStateException: Message:
invalid element state (Session info: chrome=HIDDEN) (Driver info:
chromedriver=HIDDEN
(5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT
10.0.15063 x86_64)
Putting just a small sleep between two actions solve the issue. Following code working perfectly fine.
Now before someone downvotes or put a comment about sleep. let me clarify, is this the best solution? No, it's not
But now you know why it was not working, Your action is generating some AJAX request and before it completes you are trying to do another action which is creating the issue.
The good solution would be to write the condition, which waits until that action is complete but meanwhile you have a working temporary solution.
import time
from selenium import webdriver
driver = webdriver.Chrome('h:\\bin\\chromedriver.exe')
driver.get('http://www.dhl.de/onlinefrankierung')
product_element = driver.find_element(by='id', value='bpc_PAK02')
product_element.click()
time.sleep(5)
services_element = driver.find_element(by='id', value='sc_NNAHME')
services_element.click()
time.sleep(5)
address_element_name = driver.find_element(by='name', value='formModel.sender.name')
address_element_name.send_keys("JackBlack")
If you use explicit waits, you can often avoid this error. In particular, if an element can be clicked (an input, button, select, etc.), you can wait for it to be clickable. Here is what will work in your case.
from selenium import webdriver
from selenium.webdriver.common import utils
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def waitForElementClickable(timeout=5, method='id', locator=None):
try:
element = WebDriverWait(driver, timeout).until(
EC.element_to_be_clickable((method, locator))
)
print("Element", method + '=' + locator, "can be clicked.")
return element
except Exception:
print("Element", method + '=' + locator, "CANNOT be clicked.")
raise
options = Options()
options.add_argument('--disable-infobars')
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://www.dhl.de/onlinefrankierung')
product_element = waitForElementClickable(method='id', locator='bpc_PAK02')
product_element.click()
services_element = waitForElementClickable(method='id', locator='sc_NNAHME')
services_element.click()
address_element_name = waitForElementClickable(method='name', locator='formModel.sender.name')
address_element_name.send_keys("JackBlack")
This question already has answers here:
Selenium Web Driver & Java. Element is not clickable at point (x, y). Other element would receive the click
(9 answers)
Closed 5 years ago.
I tried to automate Amazon/novels list page using selenium. It is working sometimes and not working sometimes. I am unable to understand the mistake in the code. It worked fine for some time and scrolled 13 pages out of 20. But from next time, it is not working properly. Till now it didn't scroll complete 20 pages.
from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup
class App:
def __init__(self,path='F:\Imaging'):
self.path=path
self.driver = webdriver.Chrome('F:\chromedriver')
self.driver.get('https://www.amazon.in/s/ref=sr_pg_1?rh=i%3Aaps%2Ck%3Anovels&keywords=novels&ie=UTF8&qid=1510727563')
sleep(1)
self.scroll_down()
self.driver.close()
def scroll_down(self):
self.driver.execute_script("window.scrollTo(0,5500);")
sleep(1)
load_more = self.driver.find_element_by_xpath('//span[#class="pagnRA"]/a[#title="Next Page"]')
load_more.click()
sleep(2)
for value in range(2,19):
print(self.driver.current_url)
sleep(3)
self.driver.execute_script("window.scrollTo(0,5500);")
sleep(2)
load_more = self.driver.find_element_by_xpath('//span[#class="pagnRA"]/a[#title="Next Page"]')
load_more.click()
sleep(3)
if __name__=='__main__':
app=App()
The output for this code which i am getting is:
C:\Users\Akhil\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/Akhil/Scrape/amazon.py
https://www.amazon.in/s/ref=sr_pg_2/257-8503487-3570721?rh=i%3Aaps%2Ck%3Anovels&page=2&keywords=novels&ie=UTF8&qid=1510744188
https://www.amazon.in/s/ref=sr_pg_3?rh=i%3Aaps%2Ck%3Anovels&page=3&keywords=novels&ie=UTF8&qid=1510744197
https://www.amazon.in/s/ref=sr_pg_4?rh=i%3Aaps%2Ck%3Anovels&page=4&keywords=novels&ie=UTF8&qid=1510744204
Traceback (most recent call last):
File "C:/Users/Akhil/Scrape/amazon.py", line 31, in <module>
app=App()
File "C:/Users/Akhil/Scrape/amazon.py", line 11, in __init__
self.scroll_down()
File "C:/Users/Akhil/Scrape/amazon.py", line 26, in scroll_down
load_more.click()
File "C:\Users\Akhil\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\Akhil\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 501, in _execute
return self._parent.execute(command, params)
File "C:\Users\Akhil\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 308, in execute
self.error_handler.check_response(response)
File "C:\Users\Akhil\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Element <a title="Next Page" id="pagnNextLink" class="pagnNext" href="/gp/search/ref=sr_pg_5?rh=i%3Aaps%2Ck%3Anovels&page=5&keywords=novels&ie=UTF8&qid=1510744210">...</a> is not clickable at point (809, 8). Other element would receive the click: ...
(Session info: chrome=62.0.3202.94)
(Driver info: chromedriver=2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f),platform=Windows NT 10.0.15063 x86_64)
Process finished with exit code 1
How to solve this problem?
Try the following code:
load_more = ui.WebDriverWait(driver, timeout).until(EC.element_to_be_clickable((By.XPATH, "//span[#class="pagnRA"]/a[#title="Next Page"]")))
driver.execute_script("arguments[0].scrollIntoView(true);", load_more)
load_more.click()
where timeout -- time in seconds for waiting for the element to be clickable.
Also, import the following at the beginning of the script:
from selenium.webdriver.support import ui
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
The error is that there is no Next Page element visible or clickable. You can either wait for the presence of this element like this, or put the .click() in a try / exception block to detect when it fails.
It is probably that either your target has legitimately run out of next pages (you have seen them all), or the page is still loading, or the format of the next link has changed.
I finally got the correct result using small modifications to the answer given by #RatmirAsanov.
Please see this code. This will scroll all pages without fail.
from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
class App:
def __init__(self,path='F:\Imaging'):
self.path=path
self.driver = webdriver.Chrome('F:\chromedriver')
self.driver.get('https://www.amazon.in/s/ref=sr_pg_1?rh=i%3Aaps%2Ck%3Anovels&keywords=novels&ie=UTF8&qid=1510727563')
sleep(1)
self.scroll_down()
self.driver.close()
def scroll_down(self):
sleep(3)
self.driver.execute_script("window.scrollTo(0,5450);")
sleep(3)
load_more = self.driver.find_element_by_xpath('//span[#class="pagnRA"]/a[#title="Next Page"]')
load_more.click()
sleep(3)
for value in range(2,19):
print(self.driver.current_url)
sleep(5)
self.driver.execute_script("window.scrollTo(0,5500);")
sleep(3)
load_more = WebDriverWait(self.driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[#class='pagnRA']/a[#title='Next Page']")))
self.driver.execute_script("arguments[0].click();", load_more)
#load_more.click()
sleep(3)
sleep(3)
if __name__=='__main__':
APP=App()
Following is the sample code:
from selenium import webdriver
driver = webdriver.Firefox()
(The window gets closed due to some reason here)
driver.quit()
Traceback (most recent call last): File "", line 1, in
File
"/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py",
line 183, in quit
RemoteWebDriver.quit(self) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py",
line 592, in quit
self.execute(Command.QUIT) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py",
line 297, in execute
self.error_handler.check_response(response) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py",
line 194, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: Tried to run
command without establishing a connection
Is there some way to check if an instance of webdriver is active?
be Pythonic... try to quit and catch the exception if it fails.
try:
driver.quit()
except WebDriverException:
pass
You can use something like this which uses psutil
from selenium import webdriver
import psutil
driver = webdriver.Firefox()
driver.get("http://tarunlalwani.com")
driver_process = psutil.Process(driver.service.process.pid)
if driver_process.is_running():
print ("driver is running")
firefox_process = driver_process.children()
if firefox_process:
firefox_process = firefox_process[0]
if firefox_process.is_running():
print("Firefox is still running, we can quit")
driver.quit()
else:
print("Firefox is dead, can't quit. Let's kill the driver")
firefox_process.kill()
else:
print("driver has died")
This is what I figured out and liked:
def setup(self):
self.wd = webdriver.Firefox()
def teardown(self):
# self.wd.service.process == None if quit already.
if self.wd.service.process != None:
self.wd.quit()
Note: driver_process=psutil.Process(driver.service.process.pid) will throw exception if driver already quits.
The answer by Corey Golberg is the correct way.
However if you really need to look under the hood, the driver.service.process property gives access to the underlying Popen object that manages the open browser. If the process has quit, the process property will be None and testing whether it is truthy will identify the state of the browser:
from selenium import webdriver
driver = webdriver.Firefox()
# your code where the browser quits
if not driver.service.process:
print('Browser has quit unexpectedly')
if driver.service.process:
driver.quit()
In addition to Corey Goldberg's answer, and to scign's answer:
Don't forget the import:
from selenium.common.exceptions import WebDriverException
Also, in Corey's answer, the code will hang for about 10sec while attempting to close an already closed webdriver before moving to the except clause.
I have a webscraper that is running on my system and I wanted to migrate it over to PythonAnywhere, but when I moved it now it doesn't work.
Exactly the sendkeys does not seem to work - after the following code is executed I never move on to the next webpage so an attribute error gets tripped.
My code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import csv
import time
# Lists for functions
parcel_link =[]
token = []
csv_output = [ ]
# main scraping function
def getLinks(link):
# Open web browser and get url - 3 second time delay.
#Open web browser and get url - 3 second time delay.
driver.get(link)
time.sleep(3)
inputElement = driver.find_element_by_id("mSearchControl_mParcelID")
inputElement.send_keys(parcel_code+"*/n")
print("ENTER hit")
pageSource = driver.page_source
bsObj = BeautifulSoup(pageSource)
parcel_link.clear()
print(bsObj)
#pause = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "mResultscontrol_mGrid_RealDataGrid")))
for link in bsObj.find(id="mResultscontrol_mGrid_RealDataGrid").findAll('a'):
parcel_link.append(link.text)
print(parcel_link)
for test in parcel_link:
clickable = driver.find_element_by_link_text(test)
clickable.click()
time.sleep(2)
The link I am trying to operate is:
https://ascendweb.jacksongov.org/ascend/%280yzb2gusuzb0kyvjwniv3255%29/search.aspx
and I am trying to send: 15-100*
TraceBack:
03:12 ~/Tax_Scrape $ xvfb-run python3.4 Jackson_Parcel_script.py
Traceback (most recent call last):
File "Jackson_Parcel_script.py", line 377, in <module>
getLinks("https://ascendweb.jacksongov.org/ascend/%28biohwjq5iibvvkisd1kjmm45%29/result.aspx")
File "Jackson_Parcel_script.py", line 29, in getLinks
inputElement = driver.find_element_by_id("mSearchControl_mParcelID")
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 206, in find_element_by_id
return self.find_element(by=By.ID, value=id_)
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 662, in find_element
{'using': by, 'value': value})['value']
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/errorhandler.py", line 164, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: 'Unable to locate element: {"method":"id","selector":"mSearchControl_mParcelID"}' ; Stac
ktrace:
at FirefoxDriver.findElementInternal_ (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/driver_component.js:9470)
at FirefoxDriver.findElement (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/driver_component.js:9479)
at DelayedCommand.executeInternal_/h (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/command_processor.js:11455)
at DelayedCommand.executeInternal_ (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/command_processor.js:11460)
at DelayedCommand.execute/< (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/command_processor.js:11402)
03:13 ~/Tax_Scrape $
Selenium Innitation:
for retry in range(3):
try:
driver = webdriver.Firefox()
break
except:
time.sleep(3)
for parcel_code in token:
getLinks("https://ascendweb.jacksongov.org/ascend/%28biohwjq5iibvvkisd1kjmm4 5%29/result.aspx")
PythonAnywhere uses a virtual instance of FireFox that is suppose to be headless like JSPhantom so I do not have a version number.
Any help would be great
RS
Well, maybe the browser used on PythonAnywhere does not load the site fast enough. So instead of time.sleep(3) try implicitly waiting for the element.
An implicit wait is to tell WebDriver to poll the DOM for a certain
amount of time when trying to find an element or elements if they are
not immediately available. The default setting is 0. Once set, the
implicit wait is set for the life of the WebDriver object instance.
Using time.sleep() with Selenium is not a good idea in general.
And give it more than just 3 second, with implicitly_wait() you specify the maximum time spent waiting for an element.
So if you set implicitly_wait(10) and the page loads, for example, in 5 seconds then Selenium will wait only 5 seconds.
driver.implicitly_wait(10)