I'm just running the example code of selenium from here:
http://selenium.googlecode.com/svn/trunk/docs/api/py/index.html
The code is :
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
try:
browser.find_element_by_xpath("//a[contains(#href,'http://seleniumhq.org')]")
except NoSuchElementException:
assert 0, "can't find seleniumhq"
browser.close()
But It doesn't work for me, here's what it response:
Traceback (most recent call last):
File "test.py", line 4, in <module>
driver = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 62, in __init__
desired_capabilities=capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 72, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 114, in start_session
'desiredCapabilities': desired_capabilities,
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 165, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 136, in check_response
raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message:
...
<div id="content">
<p>The following error was encountered while trying to retrieve the URL: http://127.0.0.1:60106/hub/session</p>
<blockquote id="error">
<p><b>Connection to 127.0.0.1 failed.</b></p>
</blockquote>
<p id="sysmsg">The system returned: <i>(111) Connection refused</i></p>
<p>The remote host or network may be down. Please try the request again.</p>
...
you aren't running the full example. The link you posted contains the following code:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
assert "Yahoo!" in browser.title
elem = browser.find_element_by_name("p") # Find the query box
elem.send_keys("seleniumhq" + Keys.RETURN)
time.sleep(0.2) # Let the page load, will be added to the API
try:
browser.find_element_by_xpath("//a[contains(#href,'http://seleniumhq.org')]")
except NoSuchElementException:
assert 0, "can't find seleniumhq"
browser.close()
this works fine.
The edited version of the code in your question is missing some parts, and therefore fails. Specifically, you are missing these 2 lines:
elem = browser.find_element_by_name("p") # Find the query box
elem.send_keys("seleniumhq" + Keys.RETURN)
That initiates a Yahoo search for "seleniumhq". The results of that search is the content where you want to locate the element.
if you don't do the search, it will fail on:
browser.find_element_by_xpath("//a[contains(#href,'http://seleniumhq.org')]")
When Selenium launches Firefox with
browser = webdriver.Firefox()
the first address it visits is a localhost - 127.0.0.1:xxxxx
If you are using a proxy server, the localhost cannot be visited with the proxy set.
So, first you need to turn off your proxies using:
unset http_proxy
unset ftp_proxy
unset socks_proxy
unset https_proxy
Once you have turned your proxies off, your Firefox should start without any error. But now you need to set your proxies on Firefox. The technique described here works.
Related
I am trying to use selenium with chrome driver to connect to a website. But it couldn't be reached.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
CHROME_EXECUTABLE_PATH = "C://Program Files (x86)//Chrome Driver//chromedriver.exe"
CHROME_OPTIONS = webdriver.ChromeOptions()
CHROME_OPTIONS.add_argument("--disable-notifications")
BASE_URL = "https://www.nordstrom.com/"
driver = webdriver.Chrome(executable_path=CHROME_EXECUTABLE_PATH, options=CHROME_OPTIONS)
# locators
search_button_locator = "//a[#id='controls-keyword-search-popover']"
search_box_locator = "//*[#id='keyword-search-input']"
driver.get(BASE_URL)
driver.find_element(By.XPATH, search_button_locator)
driver.find_element(By.XPATH, search_box_locator).send_keys("Fave Slipper")
This code gives me some error:
E:\Python\Nordstrom.com\venv\Scripts\python.exe E:/Python/Nordstrom.com/pages/simple.py
Traceback (most recent call last):
File "E:\Python\Nordstrom.com\pages\simple.py", line 14, in <module>
driver.find_element(By.XPATH, search_button_locator)
File "E:\Python\Nordstrom.com\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "E:\Python\Nordstrom.com\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "E:\Python\Nordstrom.com\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[#id='controls-keyword-search-popover']"}
(Session info: chrome=94.0.4606.61)
Process finished with exit code 1
The page looks like this:
But the expected page should be looks like this:
How to access this website?
The error points out that it was unable to find the XPATH element, which is why it errored out.
The main causes for this can be either:
the XPATH is wrong
the element has not loaded yet on the page
the site has detected your scraping attempt and blocked you
In this case it's a combination of the 2nd and 3rd options. Whenever you use a webdriver, it exposes javascript hooks that websites can detect. To hide your activity you should learn more on how device fingerprinting and either customize your script to hide itself or use a pre-made solution for it (such as PhantomJS).
Most likely you should also look into hiding your IP by using a proxy.
There is a problem with your 'BASE_URL' so try another Browser to debug the issue and also try to use explicit wait before click or locate any element
I have a python script which use selenium and chromedriver.
It runs on my CentOS8 VPS perfectly for 3 days without any problem.
But since this morning, the script launched, wait almost 80 secondes and display this :
[12/Jan/2021 23:04:51] ERROR - Failed : Message: chrome not reachable
Traceback (most recent call last):
File "script.py", line 55, in <module>
driver = launch()
File "script.py", line 37, in launch
browser = webdriver.Chrome('/usr/bin/chromedriver',chrome_options=chrome_options)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: chrome not reachable
No modification have been made, why does it fail now ?
I don't have any screen on my VPS so I can't see more information.
Here is some info :
yum info on chromedriver :
Nom : chromedriver
Version : 87.0.4280.88
Publication : 1.el8
Architecture : x86_64
Taille : 27 M
Source : chromium-87.0.4280.88-1.el8.src.rpm
Dépôt : #System
Depuis le dé : epel
google-chrome --version :
Google Chrome 87.0.4280.141
Begin of the script :
from dotenv import load_dotenv
from logger import logger as l
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.chrome.options import Options
import time
import sys
import subprocess
load_dotenv(verbose=True)
dotenv_path = '.env'
load_dotenv(dotenv_path)
def launch():
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
browser = webdriver.Chrome('/usr/bin/chromedriver',chrome_options=chrome_options)
l.info('Started Chrome')
return browser
Problem solved but don't understand how.
I just restart my VPS (reboot), and ... it works again.
Weird
EDIT : Find why !
I just made a mistake at the end of my script :
b.close();
But "b" don't exist, my driver variable name is "driver".
The exception was catched and not displayed, so I don't saw anything.
But today, I launch a "top" command, and see all "chrome" process running in background.
Probably after several days, memory was full, and Chrome can't launch.
The error was not clear but anyway, it was my fault.
Thumb rule
A common cause for Chrome to crash during startup is running Chrome as root user (administrator) on Linux. While it is possible to work around this issue by passing --no-sandbox flag when creating your WebDriver session, such a configuration is unsupported and highly discouraged. You need to configure your environment to run Chrome as a regular user instead.
Solution
Remove the following chrome_options:
--no-sandbox
and execute your code as a non root user.
Outro
Here is the link to the Sandbox story.
I am running chromedriver to try and scrape some data off of a website. Everything works fine without the headless option. However, when I add the option the webdriver takes a very long time to load the url, and when I try to find an element (that is found when run without --headless), I receive an error.
Using print statements and getting the html after the url "loaded", I find that there is no html, it's empty (See in output below).
class Fidelity:
def __init__(self):
self.url = 'https://eresearch.fidelity.com/eresearch/gotoBL/fidelityTopOrders.jhtml'
self.options = Options()
self.options.add_argument("--headless")
self.options.add_argument("--window-size=1500,1000")
self.driver = webdriver.Chrome(executable_path='.\\dependencies\\chromedriver.exe', options = self.options)
print("init")
def initiate_browser(self):
self.driver.get(self.url)
time.sleep(5)
script = self.driver.execute_script("return document.documentElement.outerHTML")
print(script)
print("got url")
def find_orders(self):
wait = WebDriverWait(self.driver, 15)
data= wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]'))) #ERROR ON THIS LINE
This is the entire output:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 102, in <module>
orders = scrape.find_tesla_orders()
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 75, in find_tesla_orders
tesla = self.driver.find_element_by_xpath("//a[#href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']")
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[#href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']"}
(Session info: headless chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729#{#29}),platform=Windows NT 10.0.17763 x86_64)
New error with updated code:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 104, in <module>
orders = scrape.find_tesla_orders()
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 76, in find_tesla_orders
tesla = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I have tried finding the answer to this through google but none of the suggestions work. Is anyone else having this issue with certain websites? Any help appreciated.
Update
This script still does not work unfortunately, the webdriver is not loading the page correctly for some reason while headless, even though everything works perfectly without running this using the headless option.
For anyone in the future who is wondering the fix to this, some websites just don't load correctly with the headless option of chrome. I don't think there is a way to fix this. Just use a different browser (like firefox). Thanks to user8426627 for this.
Have you tried using a User-Agent?
I was experiencing the same error. First what I did was to download the HTML source page for both headless and normal with:
html = driver.page_source
file = open("foo.html","w")
file.write(html)
file.close()
The HTML source code for the headless mode was a short file with this line nearly at the end: The page cannot be displayed. Please contact the administrator for additional information. But the normal mode was the expected HTML.
I solve the issue by adding an User-Agent:
from fake_useragent import UserAgent
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path = f"your_path",chrome_options=chrome_options)
Try setting the window size as well as being headless. Add this:
chromeOptions.add_argument("--window-size=1920,1080")
The default size of the headless browser is tiny. If the code works when headless is not enabled it might be because your object is outside the window.
Add explicit wait. You should also use another locator, the current one match 3 elements. The element has unique id attribute
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By
wait = WebDriverWait(self.driver, timeout)
data = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
some websites just don't load correctly with the headless option of chrome.
The previous statement is actually wrong. I just got into this problem where Chrome wasn't detecting the elements. When I saw the #LuckyZakary answer I was shocked because someone created a scrapping for the same website with nodeJs and didn't got this error.
#AtulGumar answer helped on Windows but on Ubuntu server it failed. So it wasn't enough. After reading this, all to the bottom, what #AtulGumar missed was to add the –disable-gpu flag.
So it work for me on Windows and Ubuntu server with no GUI with those options:
webOptions = webdriver.ChromeOptions()
webOptions.headless = True
webOptions.add_argument("--window-size=1920,1080")
webOptions.add_argument("–disable-gpu")
driver = webdriver.Chrome(options=webOptions)
I also installed xvfb and other packages as suggested here:
sudo apt-get -y install xorg xvfb gtk2-engines-pixbuf
sudo apt-get -y install dbus-x11 xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable
and executed:
Xvfb -ac :99 -screen 0 1280x1024x16 &
export DISPLAY=:99
strong texttry to add executable path into Service object
options = Options()
options.add_argument('---incognito')
options.add_argument('---disable-extension')
options.add_argument("--no-sandbox")
options.add_argument('-–disable-gpu')
options.add_argument('--headless')
service = Service (executable_path=ChromeDriverManager().install() )
return webdriver.Chrome(service=service , options=options)
its work for me :)
I have a webscraper that is running on my system and I wanted to migrate it over to PythonAnywhere, but when I moved it now it doesn't work.
Exactly the sendkeys does not seem to work - after the following code is executed I never move on to the next webpage so an attribute error gets tripped.
My code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import csv
import time
# Lists for functions
parcel_link =[]
token = []
csv_output = [ ]
# main scraping function
def getLinks(link):
# Open web browser and get url - 3 second time delay.
#Open web browser and get url - 3 second time delay.
driver.get(link)
time.sleep(3)
inputElement = driver.find_element_by_id("mSearchControl_mParcelID")
inputElement.send_keys(parcel_code+"*/n")
print("ENTER hit")
pageSource = driver.page_source
bsObj = BeautifulSoup(pageSource)
parcel_link.clear()
print(bsObj)
#pause = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "mResultscontrol_mGrid_RealDataGrid")))
for link in bsObj.find(id="mResultscontrol_mGrid_RealDataGrid").findAll('a'):
parcel_link.append(link.text)
print(parcel_link)
for test in parcel_link:
clickable = driver.find_element_by_link_text(test)
clickable.click()
time.sleep(2)
The link I am trying to operate is:
https://ascendweb.jacksongov.org/ascend/%280yzb2gusuzb0kyvjwniv3255%29/search.aspx
and I am trying to send: 15-100*
TraceBack:
03:12 ~/Tax_Scrape $ xvfb-run python3.4 Jackson_Parcel_script.py
Traceback (most recent call last):
File "Jackson_Parcel_script.py", line 377, in <module>
getLinks("https://ascendweb.jacksongov.org/ascend/%28biohwjq5iibvvkisd1kjmm45%29/result.aspx")
File "Jackson_Parcel_script.py", line 29, in getLinks
inputElement = driver.find_element_by_id("mSearchControl_mParcelID")
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 206, in find_element_by_id
return self.find_element(by=By.ID, value=id_)
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 662, in find_element
{'using': by, 'value': value})['value']
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/errorhandler.py", line 164, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: 'Unable to locate element: {"method":"id","selector":"mSearchControl_mParcelID"}' ; Stac
ktrace:
at FirefoxDriver.findElementInternal_ (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/driver_component.js:9470)
at FirefoxDriver.findElement (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/driver_component.js:9479)
at DelayedCommand.executeInternal_/h (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/command_processor.js:11455)
at DelayedCommand.executeInternal_ (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/command_processor.js:11460)
at DelayedCommand.execute/< (file:///tmp/tmpiuuqg3m7/extensions/fxdriver#googlecode.com/components/command_processor.js:11402)
03:13 ~/Tax_Scrape $
Selenium Innitation:
for retry in range(3):
try:
driver = webdriver.Firefox()
break
except:
time.sleep(3)
for parcel_code in token:
getLinks("https://ascendweb.jacksongov.org/ascend/%28biohwjq5iibvvkisd1kjmm4 5%29/result.aspx")
PythonAnywhere uses a virtual instance of FireFox that is suppose to be headless like JSPhantom so I do not have a version number.
Any help would be great
RS
Well, maybe the browser used on PythonAnywhere does not load the site fast enough. So instead of time.sleep(3) try implicitly waiting for the element.
An implicit wait is to tell WebDriver to poll the DOM for a certain
amount of time when trying to find an element or elements if they are
not immediately available. The default setting is 0. Once set, the
implicit wait is set for the life of the WebDriver object instance.
Using time.sleep() with Selenium is not a good idea in general.
And give it more than just 3 second, with implicitly_wait() you specify the maximum time spent waiting for an element.
So if you set implicitly_wait(10) and the page loads, for example, in 5 seconds then Selenium will wait only 5 seconds.
driver.implicitly_wait(10)
I am using Selenium Webdriver, 2.25 I have a local hub set up with this json setting for chrome and firefox:
[
{
"browserName": "firefox",
"maxInstances": 5,
"seleniumProtocol": "WebDriver"
},
{
"browserName": "chrome",
"maxInstances": 5,
"seleniumProtocol": "WebDriver"
}
],
I can start a webdriver firefox session like this:
capability = getattr(webdriver.DesiredCapabilities, "FIREFOX")
dd=webdriver.Remote('http://localhost:4444/wd/hub', capability)
which works fine
but if I try to start a Chrome session like this:
capability = getattr(webdriver.DesiredCapabilities, "CHROME")
dd=webdriver.Remote('http://localhost:4444/wd/hub', capability)
I get this error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 62, in init
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 104, in start_session
'desiredCapabilities': desired_capabilities,
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 155, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 147, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: None ; Stacktrace: Method innerGet threw an error in None
But I can start a direct connection to Chrome like this:
dd=webdriver.Chrome()
Without any problem.
What can I do to get to Chrome through my Selenium Hub?
I had EXACTLY the same problem.
The thing is, unlike Firefox, Chrome needs separate chromdriver.exe to act as bridge between browser and driver.
From the documentation:
The ChromeDriver consists of three separate pieces. There is the
browser itself ("chrome"), the language bindings provided by the
Selenium project ("the driver") and an executable downloaded from the
Chromium project which acts as a bridge between "chrome" and the
"driver". This executable is called "chromedriver", but we'll try and
refer to it as the "server" in this page to reduce confusion.
Download chromdriver.exe here
And put it in your chrome binary dir.
I then use a .bat file to launch my hub with this listing:
java -Dwebdriver.chrome.driver="C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe" -jar D:\soft\selenium-server-standalone-2.29.0.jar
I then execute the following Python code on my Linux box, it worked flawlessly once I put chromedriver.exe in the Chrome dir and launched the hub with correct path parameters:
from selenium import webdriver
url = "http://192.168.1.115:4444/wd/hub"
driver = webdriver.Remote(command_executor = url, desired_capabilities = {'browserName':'chrome'})
driver.get("http://google.com")
Hope this helps you and the others with the same problem. Finding the solution was of course not to take firefox approach for granted and RTFM:
Chrome driver documentation
You need to setup the chrome driver, info about that here
UPDATE
Based on a sample json setup file
and steps provided in the first link, seems like the browser name should not be in Upper but in fact lower case.
So change CHROME to chrome
Example
WebDriver driver = new RemoteWebDriver("http://localhost:9515", DesiredCapabilities.chrome());
driver.get("http://www.google.com");
and in your case, I would assume
dd=webdriver.Remote('http://localhost:4444/wd/hub', DesiredCapabilities.chrome())