I used below code on google colab and installed selenium package
# !pip install selenium
# !apt-get update # to update ubuntu to correctly run apt install
# !apt install chromium-chromedriver
# !cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://pooya.um.ac.ir/gateway/PuyaAuthenticate.php")
And just loading for a while and not working at all..!!
How can i fix it to access any website?
The following function works
!apt... is the important part for using Selenium in Google Colab.
!apt update
!apt install chromium-chromedriver
!pip install selenium
from selenium import webdriver
def driversetup():
options = webdriver.ChromeOptions()
#run Selenium in headless mode
options.add_argument('--headless')
options.add_argument('--no-sandbox')
#overcome limited resource problems
options.add_argument('--disable-dev-shm-usage')
options.add_argument("lang=en")
#open Browser in maximized mode
options.add_argument("start-maximized")
#disable infobars
options.add_argument("disable-infobars")
#disable extension
options.add_argument("--disable-extensions")
options.add_argument("--incognito")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined});")
return driver
driver = driversetup()
driver.get('https://www.google.com')
Related
I'm creating a webscraper package and am in the process of uploading to Docker. Whilst I can build to the local Docker repository, I cannot run the script without the following errors appearing:
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Here is what I have in the main script so far to try and get it running on Docker:
def __init__(self, url: str = " url goes here ",
options: Optional[ChromeOptions] = None): #default url
options = ChromeOptions()
self.driver = Chrome(ChromeDriverManager().install(), options=options)
options.add_argument("--no-sandbox")
options.binary_location = '/usr/bin/google-chrome'
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-setuid-sandbox")
options.add_argument("--remote-debugging-port=9222")
options.add_argument("start-maximized")
options.add_argument('--disable-gpu')
options.add_argument("window-size=1920,1080")
From other posts I note that for some it was as simple as changing the order of options.add_argument, which I have tried but found it doesn't work for me.
I also have the following modules within the same script:
import os
import selenium
from selenium.webdriver import Chrome
from webdriver_manager.chrome import ChromeDriverManager #installs Chrome webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ChromeOptions
from selenium.webdriver.chrome.service import Service
from typing import Optional
import time
import boto3
from sqlalchemy import create_engine
import urllib.request
import tempfile #temporary directory - to be removed after all operations have finished
In my Dockerfile:
FROM python:3.8
#Set Chrome Repo
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -\
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'\
&& apt-get -y update\
#Install Chrome
&& apt-get install -y google-chrome-stable\
&& wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip\
&& apt-get install -yqq unzip\
&& unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
COPY . .
RUN pip install -r requirements.txt
#When we run the container, this will be the command run
CMD ["python", "scraper/webscraper.py"]
Just in case, I am using Docker and VSCode on Windows OS.
I've realised that by changing the order of options.add_arguments and self.driver, the script will run just fine. This is because the driver is being created first when it should be the other way around as follows:
def __init__(self, url: str = " url goes here ",
options: Optional[ChromeOptions] = None): #default url
options = ChromeOptions()
options.add_argument("--no-sandbox")
options.binary_location = '/usr/bin/google-chrome'
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-setuid-sandbox")
options.add_argument("--remote-debugging-port=9222")
options.add_argument("start-maximized")
options.add_argument('--disable-gpu')
options.add_argument("window-size=1920,1080")
self.driver = Chrome(ChromeDriverManager().install(), options=options)
I've been using Selenium with Jupyter notebook (with Chrome webdriver) for a while. Using it on Jupyter NB, a new automated window is opened and I can see my code at work which is a great utility when the automation includes selecting an option from a drop down or such.
But while using Selenium (with Chrome web driver) on Google Colab, a new automated tab/window is not opened and I can't really see what my code is doing. Feels like being in a dark cave without a Torch.
Can anyone tell me how can I see an automated tab of the code while using Selenium on Colab?
This is what I have tried till now:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.python.org")
and this too
!pip install selenium
!apt-get update
!apt install chromium-chromedriver
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver =webdriver.Chrome('chromedriver',chrome_options=chrome_options)
In Debian 10, the firefox_driver is removed from Synaptic because of incompatibility with Firefox ESR. I downloaded latest geckodriver from GitHub, plus installed latest Firefox (using flatpak). I get this error message calling webdriver.Firefox():
SessionNotCreatedException: Unable to find a matching set of capabilities
How can I run firefox from Python?
Versions
Firefox 80.0.1 (installed from flatpak)
geckodriver 0.27 (in /usr/local/bin/)
Selenium,3.14.1
Python 3.7.3
Script:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
I am trying to run the webdriver resource in the selenium module (python) in Chrome for the google colab. Firstval I have problems to parse the chromedriver.exe file in the command (selenium.webdriver.Chrome('/chromedriver.exe')), overcome that I found the continuos failure of none permission to run the chromedriver.exe, and the version is ok, who knows what possibly is wrong?
WebDriverException: Message: 'chromedriver.exe' executable may have wrong permissions.
You can do it by installing the chromium webdriver and adjusting some options such that it does not crash in google colab:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.webite-url.com")
I'm trying to use Chromedriver with Ubuntu (AWS instance). I've gotten Chromedriver to work no problem in a local instance, but having many, many issues doing so in a remote instance.
I'm using the following code:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Chrome(executable_path='/usr/bin/chromedriver', chrome_options=options)
However, I keep getting this error:
Traceback (most recent call last):
File "test.py", line 39, in <module>
driver = webdriver.Chrome()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: 'chromedriver'
I believe I'm using the most updated version of Selenium, Chrome, and Chromedriver.
Chrome version is:Version 78.0.3904.70 (Official Build) (64-bit)
Selenium:
ubuntu#ip-172-31-31-200:/usr/bin$ pip3 show selenium
Name: selenium
Version: 3.141.0
Summary: Python bindings for Selenium
Home-page: https://github.com/SeleniumHQ/selenium/
Author: UNKNOWN
Author-email: UNKNOWN
License: Apache 2.0
Location: /home/ubuntu/.local/lib/python3.6/site-packages
Requires: urllib3
And, finally, for Chromedriver, I'm almost certain I downloaded the most recent version here: https://chromedriver.storage.googleapis.com/index.html?path=78.0.3904.70/. It's the mac_64 version (I'm using Ubuntu on a Mac). I then placed chromedriver in /usr/bin , as I read that's common practice.
I have no idea why this isn't working. A few options I can think of:
some sort of access issue? I'm a beginner with command line and ubuntu - should I be running this as "root" user?
mis-match between Chromedriver and Chrome versions? Is there a way to tell which chromedriver version I have for certain?
I see that Chromedriver and Selenium are in different locations. Selenium is in: Location: /home/ubuntu/.local/lib/python3.6/site-packages and I've moved chromedriver to: /usr/bin . Could this be causing problems?
Ubuntu Server 18.04 LTS (64-bit Arm):
Download Chrome: wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
Install Chrome: sudo dpkg -i google-chrome-stable_current_amd64.deb
If you'll get error run: sudo apt-get -f install
Check Chrome: google-chrome --version
Download chromedriver for Linux: wget https://chromedriver.storage.googleapis.com/78.0.3904.70/chromedriver_linux64.zip
Unzip chromedriver, install unzip sudo apt install unzip if required: unzip chromedriver_linux64.zip
Move chromedriver to /usr/bin: sudo mv chromedriver /usr/bin/chromedriver
Check chromedriver, run command: chromedriver
Install Java: sudo apt install default-jre
Install Selenium: sudo pip3 install selenium
Create test file, nano test.py with content below. Press CTRL+X to exit and the Y to save. Execute your script - python3 test.py
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
try:
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.google.com")
s = driver.find_element_by_name("q")
assert s.is_displayed() is True
print("ok")
except Exception as ex:
print(ex)
driver.quit()
Example of using Docker and selenium/standalone-chrome-debug:
Install docker, installation steps are here
Start container, using sudo docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:3.141.59-xenon command, different options are here
Open Security Group of your instance in AWS and add TCP rule to be able to connect. You can add only your own IP and port 4444 for Selenium
Run test from local
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Remote(command_executor="http://your_instance_ip:4444/wd/hub",
desired_capabilities=options.to_capabilities())
I am running the following on ec2-ubuntu:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options) #Give the full path to chromedriver
Try it. Incase it doesn't work I will find more of the settings.