I am running chromedriver to try and scrape some data off of a website. Everything works fine without the headless option. However, when I add the option the webdriver takes a very long time to load the url, and when I try to find an element (that is found when run without --headless), I receive an error.
Using print statements and getting the html after the url "loaded", I find that there is no html, it's empty (See in output below).
class Fidelity:
def __init__(self):
self.url = 'https://eresearch.fidelity.com/eresearch/gotoBL/fidelityTopOrders.jhtml'
self.options = Options()
self.options.add_argument("--headless")
self.options.add_argument("--window-size=1500,1000")
self.driver = webdriver.Chrome(executable_path='.\\dependencies\\chromedriver.exe', options = self.options)
print("init")
def initiate_browser(self):
self.driver.get(self.url)
time.sleep(5)
script = self.driver.execute_script("return document.documentElement.outerHTML")
print(script)
print("got url")
def find_orders(self):
wait = WebDriverWait(self.driver, 15)
data= wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]'))) #ERROR ON THIS LINE
This is the entire output:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 102, in <module>
orders = scrape.find_tesla_orders()
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 75, in find_tesla_orders
tesla = self.driver.find_element_by_xpath("//a[#href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']")
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[#href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']"}
(Session info: headless chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729#{#29}),platform=Windows NT 10.0.17763 x86_64)
New error with updated code:
init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 104, in <module>
orders = scrape.find_tesla_orders()
File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 76, in find_tesla_orders
tesla = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I have tried finding the answer to this through google but none of the suggestions work. Is anyone else having this issue with certain websites? Any help appreciated.
Update
This script still does not work unfortunately, the webdriver is not loading the page correctly for some reason while headless, even though everything works perfectly without running this using the headless option.
For anyone in the future who is wondering the fix to this, some websites just don't load correctly with the headless option of chrome. I don't think there is a way to fix this. Just use a different browser (like firefox). Thanks to user8426627 for this.
Have you tried using a User-Agent?
I was experiencing the same error. First what I did was to download the HTML source page for both headless and normal with:
html = driver.page_source
file = open("foo.html","w")
file.write(html)
file.close()
The HTML source code for the headless mode was a short file with this line nearly at the end: The page cannot be displayed. Please contact the administrator for additional information. But the normal mode was the expected HTML.
I solve the issue by adding an User-Agent:
from fake_useragent import UserAgent
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path = f"your_path",chrome_options=chrome_options)
Try setting the window size as well as being headless. Add this:
chromeOptions.add_argument("--window-size=1920,1080")
The default size of the headless browser is tiny. If the code works when headless is not enabled it might be because your object is outside the window.
Add explicit wait. You should also use another locator, the current one match 3 elements. The element has unique id attribute
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By
wait = WebDriverWait(self.driver, timeout)
data = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
some websites just don't load correctly with the headless option of chrome.
The previous statement is actually wrong. I just got into this problem where Chrome wasn't detecting the elements. When I saw the #LuckyZakary answer I was shocked because someone created a scrapping for the same website with nodeJs and didn't got this error.
#AtulGumar answer helped on Windows but on Ubuntu server it failed. So it wasn't enough. After reading this, all to the bottom, what #AtulGumar missed was to add the –disable-gpu flag.
So it work for me on Windows and Ubuntu server with no GUI with those options:
webOptions = webdriver.ChromeOptions()
webOptions.headless = True
webOptions.add_argument("--window-size=1920,1080")
webOptions.add_argument("–disable-gpu")
driver = webdriver.Chrome(options=webOptions)
I also installed xvfb and other packages as suggested here:
sudo apt-get -y install xorg xvfb gtk2-engines-pixbuf
sudo apt-get -y install dbus-x11 xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable
and executed:
Xvfb -ac :99 -screen 0 1280x1024x16 &
export DISPLAY=:99
strong texttry to add executable path into Service object
options = Options()
options.add_argument('---incognito')
options.add_argument('---disable-extension')
options.add_argument("--no-sandbox")
options.add_argument('-–disable-gpu')
options.add_argument('--headless')
service = Service (executable_path=ChromeDriverManager().install() )
return webdriver.Chrome(service=service , options=options)
its work for me :)
Related
I am trying to use selenium with chrome driver to connect to a website. But it couldn't be reached.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
CHROME_EXECUTABLE_PATH = "C://Program Files (x86)//Chrome Driver//chromedriver.exe"
CHROME_OPTIONS = webdriver.ChromeOptions()
CHROME_OPTIONS.add_argument("--disable-notifications")
BASE_URL = "https://www.nordstrom.com/"
driver = webdriver.Chrome(executable_path=CHROME_EXECUTABLE_PATH, options=CHROME_OPTIONS)
# locators
search_button_locator = "//a[#id='controls-keyword-search-popover']"
search_box_locator = "//*[#id='keyword-search-input']"
driver.get(BASE_URL)
driver.find_element(By.XPATH, search_button_locator)
driver.find_element(By.XPATH, search_box_locator).send_keys("Fave Slipper")
This code gives me some error:
E:\Python\Nordstrom.com\venv\Scripts\python.exe E:/Python/Nordstrom.com/pages/simple.py
Traceback (most recent call last):
File "E:\Python\Nordstrom.com\pages\simple.py", line 14, in <module>
driver.find_element(By.XPATH, search_button_locator)
File "E:\Python\Nordstrom.com\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "E:\Python\Nordstrom.com\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "E:\Python\Nordstrom.com\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[#id='controls-keyword-search-popover']"}
(Session info: chrome=94.0.4606.61)
Process finished with exit code 1
The page looks like this:
But the expected page should be looks like this:
How to access this website?
The error points out that it was unable to find the XPATH element, which is why it errored out.
The main causes for this can be either:
the XPATH is wrong
the element has not loaded yet on the page
the site has detected your scraping attempt and blocked you
In this case it's a combination of the 2nd and 3rd options. Whenever you use a webdriver, it exposes javascript hooks that websites can detect. To hide your activity you should learn more on how device fingerprinting and either customize your script to hide itself or use a pre-made solution for it (such as PhantomJS).
Most likely you should also look into hiding your IP by using a proxy.
There is a problem with your 'BASE_URL' so try another Browser to debug the issue and also try to use explicit wait before click or locate any element
I'm trying to figure out why the following script is working when I launch it with pi user on my raspberry and not with root user.
Goal: It should open Chromium full screen, and log into the website.
With root user, it opens the web client and doesn't display anything. Screen is white, and it gives me a data; page then a Privacy error page.
#!/home/pi/Documents/raspberry_screen_chrome_script/selenium/bin/python3
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
username_django = 'username'
password_django = 'password'
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("--remote-debugging-port=9222");
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_argument("--start-fullscreen")
chrome_options.add_argument("--kiosk")
driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver', options=chrome_options)
driver.get ("my_url")
driver.find_element_by_id('id_username').send_keys(username_django)
driver.find_element_by_id('password-input').send_keys(password_django)
driver.find_element_by_id('submit-login').click()
What I used:
selenium==3.141.0
webdriver-manager==3.4.2
Chromium 88.0.4324.187 Built on Raspbian , running on Raspbian 10
ChromeDriver 88.0.4324.187
Output when launching it with root user
root#raspberrypi:/home/pi/Documents/raspberry_screen_chrome_script# ./interface_login_local.py
Traceback (most recent call last):
File "./interface_login_local.py", line 21, in <module>
driver.find_element_by_id('id_username').send_keys(username_django)
File "/home/pi/Documents/raspberry_screen_chrome_script/selenium/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 360, in find_element_by_id
return self.find_element(by=By.ID, value=id_)
File "/home/pi/Documents/raspberry_screen_chrome_script/selenium/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
'value': value})['value']
File "/home/pi/Documents/raspberry_screen_chrome_script/selenium/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/pi/Documents/raspberry_screen_chrome_script/selenium/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="id_username"]"}
(Session info: chrome=88.0.4324.187)
Edit (Added Gifs to explain behaviour):
Behaviour as Pi User
Behaviour as Root User (the white block are coming when I try to right click)
Since you do not share the actual URL you are opening with this code we can only guess what is the issue.
So, it can be:
You need to add a wait / delay before accessing the username element to let the page loaded and only then access the element.
The element locator is possibly wrong.
Element can be inside an iframe.
I'm trying to scrape Amazon product prices. And I want to scrape prices text without opening the Chrome browser. I searched this on the Internet but it didn't help me.
This is my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# Driver and link
driver = webdriver.Chrome('C:/Users/musta/Desktop/chromedriver.exe')
driver.get("https://www.amazon.com/dp/b07h9fldcd")
getText = driver.find_element_by_class_name("a-section a-spacing-micro").get_attribute("textContent")
print(getText)
driver.close()
But this didn't work. It keeps giving me this error message:
Traceback (most recent call last):
File "C:\Users\musta\Desktop\asd.py", line 8, in <module>
getText = driver.find_element_by_class_name("a-section a-spacing-micro").get_attribute("textContent")
File "C:\Users\musta\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "C:\Users\musta\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\musta\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\musta\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".a-section a-spacing-micro"}
(Session info: chrome=91.0.4472.114)
What should I do? I'm stuck at here. I want to scrape price from given div without opening Chrome browser. Hope you understand what I mean.
You need headless broswer I think :
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("start-maximized")
driver = webdriver.Chrome(r'C:/Users/musta/Desktop/chromedriver.exe', options = options)
driver.get("https://www.amazon.com/dp/b07h9fldcd")
and regarding your error, in Selenium class name do not work with spaces :
so instead of this,
.a-section a-spacing-micro
do this with css :
.a-section.a-spacing-micro
so code should look like this :
getText = driver.find_element_by_css_selector(".a-section.a-spacing-micro").get_attribute("innerHTML")
print(getText.strip())
if it just the price you wanna grab, try with this css :
span#price_inside_buybox
in code it would look like this :
getText = driver.find_element_by_css_selector("span#price_inside_buybox
").text
print(getText.strip())
There are two questions in your query:
You can open the chrome browser in headless mode so it wont render in frontend
Try to inspect element and confirm the id/class name you have used to get_attribute
I have created simple basic automation script in Python using Selenium..
Getting unwanted exception.
File:-
import pandas as pd
from pandas import ExcelWriter
from selenium import webdriver
import selenium as sel
# Data = pd.read_excel(r"C:\Users\Admin\PycharmProjects\Web_Automation_Form_Filling\challenge.xlsx",sheet_name="Sheet1")
# browser = webdriver.Chrome(executable_path=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe')
browser = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chrome.exe");
browser.sleep(1000);
browser.get("http://www.python.org")
Error log:-
C:\Users\Admin\PycharmProjects\Web_Automation_Form_Filling\venv\Scripts\python.exe C:/Users/Admin/PycharmProjects/Web_Automation_Form_Filling/venv/Web_Auto_Filling.py
Traceback (most recent call last):
File "C:/Users/Admin/PycharmProjects/Web_Automation_Form_Filling/venv/Web_Auto_Filling.py", line 10, in <module>
browser = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chrome.exe");
File "C:\Users\Admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
self.service.start()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 98, in start
self.assert_process_still_running()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 109, in assert_process_still_running
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: Service C:\Program Files (x86)\Google\Chrome\Application\chrome.exe unexpectedly exited. Status code was: 0
Process finished with exit code 1
Any suggestion will be appreciated..
Thanks...
instead of chrome application try providing the chrome driver instead
more information on the site : https://sites.google.com/a/chromium.org/chromedriver/getting-started
Sample code :
import time
from selenium import webdriver
driver = webdriver.Chrome('/path/to/chromedriver') # Optional argument, if not specified will search path.
driver.get('http://www.google.com/');
time.sleep(5) # Let the user actually see something!
search_box = driver.find_element_by_name('q')
search_box.send_keys('ChromeDriver')
search_box.submit()
time.sleep(5) # Let the user actually see something!
driver.quit()
Download the ChromeDriver binary for your platform under the downloads section of this site
reference link to download : chrome driver
This code should work (better to use firefox for selenium):
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
# noinspection PyUnresolvedReferences
import wget
DesiredCapabilities.PHANTOMJS[
"phantomjs.page.settings.userAgent"
] = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0) Gecko/20121026 Firefox/16.0"
if browser == "firefox":
driver = webdriver.Firefox()
else:
driver = webdriver.PhantomJS(
CFG_phantomjs
) # r"D:/_devs/webserver/phantomjs-1.9.8/phantomjs.exe"
driver.get("https://tourwebsite")
username = driver.find_element_by_id("login_field")
password = driver.find_element_by_id("password")
username.clear()
The problem here in your codes is that you are passing chrome executable path rather than passing the path to chromedriver which is a different executable.
An appropriate version of chromedriver can be downloaded from here according to your Chrome version.
For more info, you can refer to the chromedriver documentation here.
And your final code should be something like:
from selenium import webdriver
path = 'C:/Users/Avinash/Downloads/chromedriver.exe'
driver = webdriver.Chrome(path)
driver.get('http://www.google.com/');
#..here what ever you want to do with page here
driver.quit()
I'm just running the example code of selenium from here:
http://selenium.googlecode.com/svn/trunk/docs/api/py/index.html
The code is :
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
try:
browser.find_element_by_xpath("//a[contains(#href,'http://seleniumhq.org')]")
except NoSuchElementException:
assert 0, "can't find seleniumhq"
browser.close()
But It doesn't work for me, here's what it response:
Traceback (most recent call last):
File "test.py", line 4, in <module>
driver = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 62, in __init__
desired_capabilities=capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 72, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 114, in start_session
'desiredCapabilities': desired_capabilities,
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 165, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 136, in check_response
raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message:
...
<div id="content">
<p>The following error was encountered while trying to retrieve the URL: http://127.0.0.1:60106/hub/session</p>
<blockquote id="error">
<p><b>Connection to 127.0.0.1 failed.</b></p>
</blockquote>
<p id="sysmsg">The system returned: <i>(111) Connection refused</i></p>
<p>The remote host or network may be down. Please try the request again.</p>
...
you aren't running the full example. The link you posted contains the following code:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
assert "Yahoo!" in browser.title
elem = browser.find_element_by_name("p") # Find the query box
elem.send_keys("seleniumhq" + Keys.RETURN)
time.sleep(0.2) # Let the page load, will be added to the API
try:
browser.find_element_by_xpath("//a[contains(#href,'http://seleniumhq.org')]")
except NoSuchElementException:
assert 0, "can't find seleniumhq"
browser.close()
this works fine.
The edited version of the code in your question is missing some parts, and therefore fails. Specifically, you are missing these 2 lines:
elem = browser.find_element_by_name("p") # Find the query box
elem.send_keys("seleniumhq" + Keys.RETURN)
That initiates a Yahoo search for "seleniumhq". The results of that search is the content where you want to locate the element.
if you don't do the search, it will fail on:
browser.find_element_by_xpath("//a[contains(#href,'http://seleniumhq.org')]")
When Selenium launches Firefox with
browser = webdriver.Firefox()
the first address it visits is a localhost - 127.0.0.1:xxxxx
If you are using a proxy server, the localhost cannot be visited with the proxy set.
So, first you need to turn off your proxies using:
unset http_proxy
unset ftp_proxy
unset socks_proxy
unset https_proxy
Once you have turned your proxies off, your Firefox should start without any error. But now you need to set your proxies on Firefox. The technique described here works.