Calling the Firefox webdriver in python - python

I'm using Debian 9 Stretch and Pycharm IDE and trying to learn web-scramping; I installed the Selenium package simply by running:
pip install selenium
and the Firefox webdriver by running:
wget https://github.com/mozilla/geckodriver/releases/download/v0.19.1/geckodriver-v0.19.1-linux64.tar.gz
tar -xvzf geckodriver-v0.19.1-linux64.tar.gz.1
chmod +x geckodriver
respectively, to download the last release, extract it and make the driver executable. After that, I added the driver to the following path:
usr/local/bin
I ran all by using the Pycharm IDE terminal and not the built-in Debian terminal.
In order to open Firefox and web-scrape, I run:
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
webdriver.Firefox(executable_path="/usr/local/bin/geckodriver")
The last line gives an error message as output:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/root/PycharmProjects/Example/venv/lib/python3.5/site-packages/selenium/webdriver/firefox/webdriver.py", line 162, in __init__
keep_alive=True)
File "/root/PycharmProjects/Example/venv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/root/PycharmProjects/Example/venv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/root/PycharmProjects/Example/venv/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/root/PycharmProjects/Example/venv/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: connection refused
I'm a newbie both in Python and in web-scraping; please, could someone explain what does go wrong with installation and coding and why I got this error?
In the hope to be clear asking the question, I thanks all in advance for the help.

Assuming that the location of the geckodriver is correct, you can check below:
properties of the geckodriver should have the correct permission for the user. You would need to check the box "Allow this file to run as a program" or
if you you have restricted access, save the geckodriver in your home/username/geckodriver then path it to your firefox. Saving it in your home folder will be able to modify the properties of your geckodriver.
[EDIT] Are your running in command line? If so, you need a virtual display, I have used pyvirtualdisplay:
from pyvirtualdisplay import Display
display = Display(visible=0,size(800, 600))
display.start()
driver = webdriver.Firefox(executable_path="/usr/local/bin/geckodriver")

The problem has been solved by removing Mozilla Firefox, that, in Debian 9 Stretch is installed as ESR (Extended Release Support) by default; at the time the Firefox ESR version was 52.0.
After, I installed the by instaling the unstable Firefox version (not Beta) by running on the the terminal as super-user:
su -
gedit /etc/apt/sources.list
and adding deb http://ftp.it.debian.org/debian/ unstable main to the sources list file.
After, I ran:
apt-get update
apt-get install -t unstable firefox
to update the software and install Firefox.
By following the guidelines explained in the question to install and run the selenium Python package everything should work fine (at least, for me!).
Hope this will help other users too!

While woking with Selenium-Python Client v3.10.0 along with GeckoDriver v0.19.1 and Firefox v58.0.2 you have to initialize the WebDriver instance and assign it to a variable, which will in-turn initialize the Web Browser which will in-turn open the desired URL as follows :
from selenium import webdriver
driver = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')
driver.get('https://www.google.co.in')
print("Page Title is : %s" %driver.title)
driver.quit()

Related

WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome browser (1)

I've upgraded Ubuntu to 20.04. It seems that Chrome isn't available as APT package but via snap. After the upgrade I'm getting the error while trying to instantiate Chrome browser:
>>> from selenium.webdriver import Chrome
>>> from selenium.webdriver.chrome.options import Options
>>> o = Options()
>>> o.headless = True
>>> o.add_argument('--disable-dev-shm-usage')
>>> o.add_argument('--no-sandbox')
>>> Chrome(options=o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/var/www/order/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
RemoteWebDriver.__init__(
File "/var/www/order/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/var/www/order/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/var/www/order/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/var/www/order/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: DevToolsActivePort file doesn't exist
chromedriver version is 87.0.4280.20
chromium-browser version is Chromium 87.0.4280.66 snap
I read this discussion. The case is however isn't same. I run python as a regular user. However I've disabled dev-shm-usage and sandbox. But still it doesn't work. It worked before I've upgraded to Ubuntu 20.04. So I assume it has something to do with snap version of Chrome.
I have found out following configuration works:
Start first Chrome chromium-browser --headless --remote-debugging-port=9222 and then:
>>> from selenium.webdriver import Chrome
>>> from selenium.webdriver.chrome.options import Options
>>> o = Options()
>>> o.add_experimental_option('debuggerAddress', 'localhost:9222')
>>> b = Chrome(options=o)
>>> b.get('https://google.com')
>>> b.title
'Google'
>>>
So it seems the problem is with starting the browser.
Opened a bug report
Good explanation was provided by ChromeDriver development team in a response to the ticket I've submitted:
ChromeDriver uses the /tmp directory to communicate with Chromium, but
Snap remaps /tmp directory to a different location (specifically, to
/tmp/snap.chomium/tmp). This causes errors because ChromeDriver can't
find files created by Chromium. ChromeDriver is designed and tested
with Google Chrome, and it may have compatibility issues with
third-party distributions.
There are a couple of workarounds:
Tell ChromeDriver to use a location in your home directory, instead of /tmp. For example, if your home directory is /home/me, you can add
the following line of code to your script:
o.add_argument('--user-data-dir=/home/me/foo')
Explicitly select a port for ChromeDriver to communiate with Chromium. You need to carefully select a port (e.g., 9222) that isn't
already in use, and then add the following line to your script:
o.add_argument('--remote-debugging-port=9222')
and
Another (probably better) workaround is to use the ChromeDriver
installed by Snap, which is at /snap/bin/chromium.chromedriver. E.g.,
driver = Chrome('/snap/bin/chromium.chromedriver', options=o)
Since this version of ChromeDriver runs inside Snap, its /tmp
directory is redirected in the same way as Chromium
The solution to my problem is rather workaround but nevertheless it might be useful.
I have downgraded to Chrome version 86, which I have installed from deb package instead of snap. Installation instructions can be found here
Once I've downgraded chromium and chromedriver to version 86 the problem has gone.
The answer provided by Ralfeus is wholesome, I am using Ubuntu 20.04 and was facing same issues.
I used driver = Chrome('/snap/bin/chromium.chromedriver') , yeah without any additional options and it worked.

Selenium Geckodriver and ChromeDriver not working

Running OS 10.12.6 Selenium with Python 3.6 bindings
Despite my best efforts I can't seem to get either working with Selenium. Here's the error I get:
Geckodriver error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 74, in start
stdout=self.log_file, stderr=self.log_file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'geckodriver': 'geckodriver'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/christopher.gaboury/Desktop/Scripts/safariExecutive.py", line 11, in <module>
browser = webdriver.Firefox()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 148, in __init__
self.service.start()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
The error for both chromedriver and geckodriver are essentially the same.
I've manually set my path to where these are located. Same error. I've moved the drivers to a location already in the path. Same error. Ive removed the two versions I downloaded and installed both drivers via Homebrew. Same errors. I'm not sure what to do next.
Homebrew needed to link the drivers. Once this was done they worked perfectly.
I am able to use chromedriver now without a problem. I set chromedriver.exe in a file folder renamed chromedriver, then I make sure the filepath is in parentheses. I however have the same problem with opening geckodriver through selenium despite the fact that pathlib.Path recognizes that the geckodriver.exe file exists. See if you can get chromedriver working this way:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser=webdriver.Chrome(r'c:\path_to_chromedriver\chromedriver\chromedriver.exe')
As in the last line above, you should make certain that you include the file path in the form of a raw string (the raw string making sure that Python doesn't read the backslashes as escape characters). If you do ever figure out how to solve your problem with geckodriver (as I have tried adding it to the system Path through advanced settings yet it didn't work), please let me know, but that should get Selenium working with Chrome.
I have created new bash script which installs automatically geckodriver, firefox, python, selenium and tests the configuration.
#!/bin/bash
apt update && apt install -y firefox python3 python3-pip
pip install selenium
INSTALL_DIR="/usr/local/bin"
json=$(curl -s https://api.github.com/repos/mozilla/geckodriver/releases/latest)
input=$(echo "$json" | jq -r '.assets[].browser_download_url | select(contains("linux64"))')
urls=$(echo $input | tr " " "\n")
for addr in $urls
do
url=$addr
break
done
curl -s -L "$url" | tar -xz
chmod +x geckodriver
sudo mv geckodriver "$INSTALL_DIR"
echo "installed geckodriver binary in $INSTALL_DIR"
python3 -c "from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(\"http://google.com/\")
print (\"Headless Firefox Initialized\")
driver.quit()"

geckodriver executable needs to be in path

I have read previous questions asked on this topic and tried to follow the suggestions but I continue to get errors. On terminal, I ran
export PATH=$PATH:/Users/Conger/Documents/geckodriver-0.8.0-OSX
I also tried
export PATH=$PATH:/Users/Conger/Documents/geckodriver
When I run the following Python code
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = '/Users/Conger/Documents/Firefox.app'
driver = webdriver.Firefox(capabilities=firefox_capabilities)
I still get the following error
Python - testwebscrap.py:8
Traceback (most recent call last):
File "/Users/Conger/Documents/Python/Crash_Course/testwebscrap.py", line 11, in <module>
driver = webdriver.Firefox(capabilities=firefox_capabilities)
File "/Users/Conger/miniconda2/lib/python2.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 135, in __init__
self.service.start()
File "/Users/Conger/miniconda2/lib/python2.7/site-packages/selenium/webdriver/common/service.py", line 71, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x1006df6d0>> ignored
[Finished in 0.194s]
you may downgrade your selenium by
pip install selenium==2.53.6
This has solved my issue
On mac:
brew install geckodriver
Homebrew is the most popular package manager for Mac OS X, you will need install XCode on your mac and it will be then accesible from your terminal.
You can follow this tutorial if required
First we know that gekodriver is the driver engine of Firefox,and we know that
driver.Firefox() is used to open Firefox browser, and it will call the gekodriver engine ,so we need to give the gekodirver a executable permission.
so we download the latest gekodriver uncompress the tar packge ,and put gekodriver at the /usr/bin/
ok,that's my answer and i have tested.
I just downloaded the latest version geckodriver (I have win7) from here and added that exe-file in python directory (which already in PATH)
export path works only in the terminal you have entered the command. If you try to run the script from a different terminal, you will get the same error.

launch selenium from python on ubuntu

I have the following script
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://localhost:8000')
assert 'Django' in browser.title
I get the following error
$ python3 functional_tests.py
Traceback (most recent call last): File "functional_tests.py", line 3, in <module>
browser = webdriver.Firefox() File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/firefox/webdriver.py", line 80, in __init__
self.binary, timeout) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/firefox/extension_connection.py", line 52, in __init__
self.binary.launch_browser(self.profile, timeout=timeout) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/firefox/firefox_binary.py", line 68, in launch_browser
self._wait_until_connectable(timeout=timeout) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/firefox/firefox_binary.py", line 99, in _wait_until_connectable
"The browser appears to have exited " selenium.common.exceptions.WebDriverException: Message: The browser appears to have exited before we could connect. If you specified a log_file in the FirefoxBinary constructor, check it for details.
pip3 list shows selenium (2.53.6).
firefox -v shows Mozilla Firefox 47.0.
I struggled with this problem as well, and I was unhappy with having to use older versions of Firefox. Here's my solution that uses the latest version of Firefox. It however involves several steps
Step 1. Download v0.9.0 Marionette, the next generation of FirefoxDriver, from this location: https://github.com/mozilla/geckodriver/releases/download/v0.9.0/geckodriver-v0.9.0-linux64.tar.gz
Step 2. Extract the file to a desired folder, and rename it to "wires". In my case I created a folder named "add_to_system_path" under Documents. So the file is in Documents/add_to_system_path/wires (also make sure that the wires file is executable under its properties)
Step 3. Create a file named ".pam_environment" under your home folder, and then adding this line on it and save
PATH DEFAULT=${PATH}:/absolute/path/to/the/folder/where/wires/is/saved
What this does is it tells ubuntu to add the enumerated dir in .pam_environment to your system path
Step 4. Save the file, log out of your user session, and log back in. This is necessary to do so that the files in the newly added system path is recognized by ubuntu
Step 5. Use the code below to instantiate the browser instance:
`
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.FIREFOX
capabilities["marionette"] = True
browser = webdriver.Firefox(capabilities=capabilities)
browser.get('http://your-target-url')`
Firefox should now be able to instantiate as usual.
The last version of Firefox is not working properly with selenium. Try with 46 or 45.
You can download here: ftp.mozilla.org/pub/firefox/releases
or sudo apt-get install firefox=45.0.2+build1-0ubuntu1
You can also do this graphically as shown here http://www.howtogeek.com/117929/how-to-downgrade-packages-on-ubuntu/

Selenium's firefox webdriver works in one machine but not another

Here is the very simple code that I am running:
from selenium import webdriver
driver = webdriver.Firefox()`
Here is the error that it causes on a google compute instance:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 77, in __init__
self.binary, timeout),
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/extension_connection.py", line 49, in __init__
self.binary.launch_browser(self.profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.py", line 68, in launch_browser
self._wait_until_connectable()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.py", line 98, in _wait_until_connectable
raise WebDriverException("The browser appears to have exited "
selenium.common.exceptions.WebDriverException: Message: The browser appears to have exited before we could connect. If you specified a log_file in the FirefoxBinary construct
or, check it for details.
On both machines I have
Python 2.7.6
Mozilla Firefox 42.0
locate selenium returns ... /usr/local/lib/python2.7/dist-packages/selenium-2.48.0.dist-info/DESCRIPTION.rs ...
So, Selenium 2.48.0
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty
One machine is run on a google compute server (it errors) and one is on a VM VirtualBox (it works), but this is the only difference that I can find and it shouidn't matter.
What other differences are there that could cause this error on one machine but not the other?
NOTE:
I am thinking maybe the google compute engine cannot open a browser window since you can only ssh into a command line? Then this problem can't be solved?
NOTE:
This code works on both machines:
from selenium import webdriver
driver = webdriver.PhantomJS
But, I need to use firefox, so this is not a solution just another thing to keep in mind.
As you noted the reason could be that you are running on an headless system. PhantomJS and HTMLUnit and stuf like this don't requiere to have a x-server.
Can you try to start firefox on your commandline just typing firefox.
If that fails with an exception like Can't find/open display on 0.0 or smth. like this, you should use XVFB:
Here is a description how to use XVFB.
sudo apt-get update
sudo apt-get install firefox xvfb
sudo Xvfb :10 -ac
export DISPLAY=:10
Now you can try to start firefox with firefox
The commands i copied from:
http://www.installationpage.com/selenium/how-to-run-selenium-headless-firefox-in-ubuntu/
If you like to set the DISPLAY Port within your java application and just for your firefox instance you can do it like this:
FirefoxBinary firefoxBinary = new FirefoxBinary();
firefoxBinary.setEnvironmentProperty("DISPLAY", ":10");
new FirefoxDriver(firefoxBinary, new FirefoxProfile());
I need to run 'gcloud init' then login as default. See document at step 1: https://cloud.google.com/compute/docs/gcloud-compute/

Categories