Running Selenium tests on a xvfb Not being able to connect - python

I'm trying to run a script which run several tests using Selenium Firefox webdriver.
It works flawless in a local machine, but fail miserably running on a xvfb.
The machine is a CentOS release 6.8 (Final)
Firefox version 45.6.0
I'm using Python/Marionette
The command is similar to this:
xvfb-run --server-args="-screen 0, 1920x1080x24" MyProgram
Running this way I get several errors related to not loading the page.
So I got a few screenshots, and all I see is the "Unable to connect" Firefox screen.
At first I though it could be proxy related... I was already implicit not disabling the proxy and a simple "wget" would work as expect.
But then I forced the Firefox preference in the code so it doesn't use the proxy, for sure, right?
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 0)
Same result.
So I googled for similar situations and found some answers asking to add the display number in the command line.
So I changed the command line to it:
export DISPLAY=:1
xvfb-run --server-args=":1 -screen 0, 1920x1080x24" MyProgram
Then I got a different error, but still not working:
ERROR: WebDriverException: connection refused Traceback (most recent call last):
I have also tried to log more information adding the -e parameter to xvfb-run, but all I get is an empty file.
Any idea what else can I try to make it work?
* UPDATE *
Here's a small code to reproduce the issue
from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import *
display = Display(visible=0, size=(1920, 1080))
display.start()
profile = webdriver.FirefoxProfile()
profile.set_preference("network.http.phishy-userpass-length", 255);
profile.set_preference("network.proxy.type", 0)
capabilities = None
# Marionette not necessary as it's Firefox 45
# capabilities = DesiredCapabilities.FIREFOX
# capabilities["marionette"] = True
print("Getting webdriver...")
browser = webdriver.Firefox(firefox_profile=profile, capabilities=capabilities)
print("Requesting URL...")
browser.get('https://www.google.com')
print("TITLE:", browser.title)
browser.quit()
display.stop()
The output:
Getting webdriver...
Requesting URL...
TITLE: Problem loading page

Related

Selenium: AttributeError: 'Options' object has no attribute 'set_headless'

I'm wanting to write a python script using Selenium to scrape a website. Following along with the Real Python article on it, I literally copy and pasted the following code into a py file:
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
opts = Options()
opts.set_headless()
assert opts.headless # Operating in headless mode
browser = Firefox(options=opts)
browser.get('https://duckduckgo.com')
Running the script I get the following error:
opts.set_headless()
AttributeError: 'Options' object has no attribute 'set_headless'
Attempted to follow this article and commented out the opts.set_headless() attribute and added opts.headless = True but now I get the following error:
Traceback (most recent call last):
File "/home/usr/local/folder/scraper.py", line 10, in <module>
browser = Firefox(options=opts)
File "/home/usr/local/folder/scraper/venv/lib/python3.10/site-packages/selenium/webdriver/firefox/webdriver.py", line 192, in __init__
self.service.start()
File "/home/usr/local/folder/scraper/venv/lib/python3.10/site-packages/selenium/webdriver/common/service.py", line 106, in start
self.assert_process_still_running()
File "/home/usr/local/folder/scraper/venv/lib/python3.10/site-packages/selenium/webdriver/common/service.py", line 119, in assert_process_still_running
raise WebDriverException(f"Service {self.path} unexpectedly exited. Status code was: {return_code}")
selenium.common.exceptions.WebDriverException: Message: Service geckodriver unexpectedly exited. Status code was: -6
I verified that the geckodriver is located in my $PATH so I have no idea why none of this isn't working. I am using selenium v4.7.2.
After much hair pulling, I was able to determine that almost all articles on the internet dealing with Selenium use deprecated methods and attributes. Hopefully this answer will help many others who have been trying to use this library.
First, the .set_headless() method is fully deprecated and doesn't work. The Python Forums had a helpful discussion around it. In order to use a headless browser, you need to use .add_argument("--headless") and not any other way.
Second, there is now the Service() class that needs to be imported and used for any executable_path= pointing to the geckodriver and any other paths such as logs. These two posts helped on this matter: stackoverflow_1 and stackoverflow_2.
Third, after fixing the code and using the correct modules, attributes, methods and arguments, it was still getting hung up. Searching the logs was pointing to a socket timeout and an issue that was being dealt with the dev team in Sep 2022. This helped me realize that the geckodriver version linked in the original Real Python article I was using was long outdated and needed to be updated to the latest version, which is v0.32.0 at the time of writing.
However, that wasn't why it was getting hung up. I decided to comment out the headless argument and that showed that the Firefox browser was the issue. Apparently, with ubuntu 22.04, Firefox is installed by default with snap and needs to be installed as a .deb file. Here is a good article explaining it.
So ultimately, many different issues with this library and it's constantly being updated with past features, which most articles on the internet use, are all deprecated. The Selenium documentation isn't the greatest either. Here is my final code with the previous issues commented out:
# from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
# Setup--
options = Options()
options.add_argument("--headless")
service = Service(executable_path="/home/$PATH/location/geckodriver.exe", log_path="/home/file/location/log/geckodriver.log")
# caps = webdriver.DesiredCapabilities().FIREFOX
# caps["marionette"] = True
### Deprecated
browser = Firefox(service=service, options=options)
# browser = webdriver.Firefox(firefox_profile=options, capabilities=caps, executable_path="~/bin/geckodriver.exe")
# Parse--
browser.get('https://duckduckgo.com')
logo = browser.find_element(by=id, value='logo_homepage_link')
print(logo[0].text)
browser.quit()
This should work:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
opts = Options()
opts.headless = True
browser = Firefox(options=opts, executable_path='C:\TheActualPathToThe\geckodriver.exe')
browser.get('https://duckduckgo.com')

contentEncodingError when opening website in Firefox using selenium and python3

I'm trying to perform a fairly simple action using Selenium, namely opening google images in Firefox browser.
I also use a proxy server running on the localhost.
from selenium.webdriver import Firefox, FirefoxOptions
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.proxy import Proxy, ProxyType
options = FirefoxOptions()
service = Service()
options.add_argument("--headless")
options.accept_insecure_certs = True
proxy = Proxy({
'httpProxy': proxy_addr,
'sslProxy': proxy_addr,
'proxyType': ProxyType.MANUAL
})
options.proxy = proxy
b = Firefox(service=service, options=options)
b.execute("get", {'url': 'http://images.google.com'})
But unfortunately, I'm getting an error like this:
selenium.common.exceptions.WebDriverException: Message: Reached error
page:
about:neterror?e=contentEncodingError&u=https%3A//images.google.com/%3Fgws_rd%3Dssl&c=UTF-8&d=The%20page%20you%20are%20trying%20to%20view%20cannot%20be%20shown%20because%20it%20uses%20an%20invalid%20or%20unsupported%20form%20of%20compression.
I would be very grateful for any thoughts and advice what exactly might be the problem and at least approximately what should be paid attention to.
I'm using:
debian
firefox-esr
selenium == 4.2.0
geckodriver-v0.31.0
This error message...
selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=contentEncodingError&u=https%3A//images.google.com/%3Fgws_rd%3Dssl&c=UTF-8&d=The%20page%20you%20are%20trying%20to%20view%20cannot%20be%20shown%20because%20it%20uses%20an%20invalid%20or%20unsupported%20form%20of%20compression.
...implies that there are some configuration settings mismatch while GeckoDriver initiates/spawns a new Browsing Context i.e. firefox session and is often observed as:
Solution
As per the mozilla support docs you need to try out the following steps:
Try to reset the network.http.accept-encoding prefs on the about:config page in case they show as user set (bold). You can open the about:config page via the location/address bar. You can accept the warning and click "I'll be careful" to continue.
If you are having Avast Antivirus or Malwarebytes installed, you may need to disable those in the test machine before executing the tests.

WebDriverError#chrome instead of WebDriverError#firefox while using Firefox browser with selenium

I'm trying to write several tests using selenium, but I'm seeing the following strange behavior.
When I run the tests like this:
from selenium.webdriver import Firefox, FirefoxOptions
from selenium.webdriver.firefox.service import Service
options = FirefoxOptions()
service = Service()
brow = Firefox(service=service, options=options)
brow.execute("get", {'url': 'https://python.org'})
I get the result I expected, the python.org website is opened in Firefox browser.
But if I make a mistake in URL, I'm getting the following error:
from selenium.webdriver import Firefox, FirefoxOptions
from selenium.webdriver.firefox.service import Service
options = FirefoxOptions()
service = Service()
brow = Firefox(service=service, options=options)
brow.execute("get", {'url': 'qwerty'})
selenium.common.exceptions.InvalidArgumentException: Message: Malformed URL: URL constructor: qwerty is not a valid URL.
Stacktrace:
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:186:5
InvalidArgumentError#chrome://remote/content/shared/webdriver/Errors.jsm:315:5
GeckoDriver.prototype.navigateTo#chrome://remote/content/marionette/driver.js:804:11
I just want to understand why I see here WebDriverError#chrome, and not WebDriverError#firefox or something like that.
Is this a bug, or am I doing something wrong?
These error messages...
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:186:5
InvalidArgumentError#chrome://remote/content/shared/webdriver/Errors.jsm:315:5
GeckoDriver.prototype.navigateTo#chrome://remote/content/marionette/driver.js:804:11
containing the phrase #chrome may leave an impression of a strange behavior while using GeckoDriver and firefox combo.
However, as per #AutomatedTester's comment in the GitHub discussion Selenium 3.4.0-GeckoDriver 0.17.0 : GeckoDriver producing logs through Chromium/Chrome modules #787:
These errors are nothing to worry about. Mozilla uses different open source projects to build Firefox for different reasons. It showing Chrome errors means nothing in the big picture.
So you can ignore them safely.

Python, Selenium, Chromedriver & TOR - website not rendering correctly

I managed to get Python Selenium work with TOR in order to anon IP address, using the following code:
from selenium import webdriver
import os
torexe = os.popen(r'.\tor\tor.exe')
PROXY = "socks5://localhost:9050" # IP:PORT or HOST:PORT
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=%s' % PROXY)
prefs = {"profile.managed_default_content_settings.images": 1,
"javascript.enabled": True}
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(options=options, executable_path=r'../../ChromeDriver.LENOVO/chromedriver.exe')
driver.get("https://asdfiles.com")
It is slow but that's ok: it is expected. And it works fine: with icanhazip.com I can check that I am browsing with another IP. HOWEVER, when I try to load https://asdfiles.com/ the page is not rendering correctly, as the screenshot below:
WHEN I LOAD IN TOR BROWSER, it renders correctly:
The problem is that there is a buttom a need to click using selenium, and when I do that nothing happens. I tried to inspect the code and found nothing. Tried to enable javascript on chromedriver and it did not work also.
Any suggestions how to tacke this?
I am running Python 3.8.8 on Windows 10;
When I run tor.exe it says: Tor 0.3.2.10 running on Windows 8 with Libevent 2.0.22-stable, OpenSSL 1.0.2n, Zlib 1.2.8, Liblzma N/A, and Libzstd N/A;
Chome 96.0.4664.45 (Versão oficial) 64 bits
Cheers!

Running Selenium without GUI : Status code 64

I'm trying to run Selenium in Headless mode in a Linux machine without GUI. The problem is that I'm getting a WebDriverException and I can't find anywhere what the status code 64 means.
Does anyone know where to find the status code definitions ?
Code :
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(1024, 768))
display.start()
path = '/home/workspace/geckodriver'
driver = webdriver.Firefox(executable_path=path, service_args=['--verbose', '--log-path=/tmp/firefox.log'])
# website testing functionality:
driver.get('https://python.org')
print(driver.title)
Error :
WebDriverException: Message: Service /home/workspace/geckodriver unexpectedly exited. Status code was: 64
I'm not sure what the status code means, but try to update the Firefox webdriver. Updating the Firefox webdriver fixed it for me.

Categories