Unable to run script/package locally on Docker - python

I'm creating a webscraper package and am in the process of uploading to Docker. Whilst I can build to the local Docker repository, I cannot run the script without the following errors appearing:
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Here is what I have in the main script so far to try and get it running on Docker:
def __init__(self, url: str = " url goes here ",
options: Optional[ChromeOptions] = None): #default url
options = ChromeOptions()
self.driver = Chrome(ChromeDriverManager().install(), options=options)
options.add_argument("--no-sandbox")
options.binary_location = '/usr/bin/google-chrome'
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-setuid-sandbox")
options.add_argument("--remote-debugging-port=9222")
options.add_argument("start-maximized")
options.add_argument('--disable-gpu')
options.add_argument("window-size=1920,1080")
From other posts I note that for some it was as simple as changing the order of options.add_argument, which I have tried but found it doesn't work for me.
I also have the following modules within the same script:
import os
import selenium
from selenium.webdriver import Chrome
from webdriver_manager.chrome import ChromeDriverManager #installs Chrome webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ChromeOptions
from selenium.webdriver.chrome.service import Service
from typing import Optional
import time
import boto3
from sqlalchemy import create_engine
import urllib.request
import tempfile #temporary directory - to be removed after all operations have finished
In my Dockerfile:
FROM python:3.8
#Set Chrome Repo
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -\
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'\
&& apt-get -y update\
#Install Chrome
&& apt-get install -y google-chrome-stable\
&& wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip\
&& apt-get install -yqq unzip\
&& unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
COPY . .
RUN pip install -r requirements.txt
#When we run the container, this will be the command run
CMD ["python", "scraper/webscraper.py"]
Just in case, I am using Docker and VSCode on Windows OS.

I've realised that by changing the order of options.add_arguments and self.driver, the script will run just fine. This is because the driver is being created first when it should be the other way around as follows:
def __init__(self, url: str = " url goes here ",
options: Optional[ChromeOptions] = None): #default url
options = ChromeOptions()
options.add_argument("--no-sandbox")
options.binary_location = '/usr/bin/google-chrome'
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-setuid-sandbox")
options.add_argument("--remote-debugging-port=9222")
options.add_argument("start-maximized")
options.add_argument('--disable-gpu')
options.add_argument("window-size=1920,1080")
self.driver = Chrome(ChromeDriverManager().install(), options=options)

Related

Selenium Chromedriver not working with Google Colab Anymore on Python 3.8.16?

I have been using selenium chromedriver in google colab for the past year and it seems to be working perfectly.
But last week, the script doesnt seem to be working anymore. I looked at the python version of google colab and it's now on python 3.8.16 which I think is the culprit of this code breaking.
I use the code:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install -y chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver= webdriver.Chrome('chromedriver',options=chrome_options)`
And now in this line:
driver= webdriver.Chrome('chromedriver',options=chrome_options)
I get an error saying:
WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: 1
Anyone already found a fix for this?
Seems to be a problem with chromedriver itsself, and not python.
Also, I couldn't reproduce the error.
The execution seems to be fine for me.
You might check, if there's something else in your code which might break it and you didn't provide here.
Run This Codes. For me it Worked
%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap
# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF
# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg
# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500
Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300
Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF
Then Run This
!apt-get update
!apt-get install chromium chromium-driver
!pip3 install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = "http://example.com/"
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
driver = webdriver.Chrome("chromedriver", options=options)
driver.get(url)
print(driver.title)
driver.quit()

How to run Chrome Headless in Docker Container with Selenium?

I am trying to run a simple test file that is meant to open google.com on chrome within an openjdk docker container and return "Completely Successfully" upon completion, however, I keep receiving the same error saying that the "service object has no attribute process". This is the error I keep receiving:
Traceback (most recent call last):
File "/NewJersey/test.py", line 60, in <module>
print(main())
^^^^^^
File "/NewJersey/test.py", line 42, in main
driver = webdriver.Chrome(service = service, options=chrome_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
super().__init__(
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/chromium/webdriver.py", line 103, in __init__
self.service.start()
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/common/service.py", line 106, in start
self.assert_process_still_running()
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/common/service.py", line 117, in assert_process_still_running
return_code = self.process.poll()
^^^^^^^^^^^^
AttributeError: 'Service' object has no attribute 'process'
This is the code I am running:
#General Imports
from logging import error
import os
import sys
import time
import os.path
import random
#Selenium Imports (Chrome)
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
#ChromeDriver Import
from webdriver_manager.chrome import ChromeDriverManager
def main():
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
service = ChromeService("/chromedriver")
driver = webdriver.Chrome(service = service, options=chrome_options)
try:
completion_msg = reroute(driver)
print(completion_msg)
driver.close()
return "Test Completed Successfully"
except error as Error:
return Error
def reroute(driver):
driver.get("https://www.google.com")
return "Success"
if __name__ == "__main__":
print(main())
This is my docker container:
# syntax=docker/dockerfile:1
FROM openjdk:11
ENV PATH = "${PATH}:/chromedriver/chromedriver.exe"
RUN apt-get update && apt-get install -y \
software-properties-common \
unzip \
curl \
xvfb \
wget \
bzip2 \
snapd
# Chrome
RUN apt-get update && \
apt-get install -y gnupg wget curl unzip --no-install-recommends && \
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list && \
apt-get update -y && \
apt-get install -y google-chrome-stable && \
CHROMEVER=$(google-chrome --product-version | grep -o "[^\.]*\.[^\.]*\.[^\.]*") && \
DRIVERVER=$(curl -s "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_$CHROMEVER") && \
wget -q --continue -P /chromedriver "http://chromedriver.storage.googleapis.com/$DRIVERVER/chromedriver_linux64.zip" && \
unzip /chromedriver/chromedriver* -d /chromedriver
# Python
RUN apt-get update && apt-get install -y \
python2.7 \
python-setuptools \
python3-pip
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
CMD python3 test.py
When I first started my project, I attempted to do it with firefox but due to certain limitations chose to switch to chrome.
After trying to do research, there were suggestions to pass the path of chromedriver to the service object and add the path of chromedriver to the PATH in the docker container, both of which I have already done as shown above. I continue to get the exact same error.
I haven't been able to find any other solutions to the above issue so I would greatly appreciate any help!
In case anyone else stumbles across this and has a similar problem, this is how I solved it.
I simply removed the service object entirely. It seems that for whatever reason, the service object wasn't configured correctly or even needed once I had added the ChromeDriver path to my System Path on the dockerfile. The code snippet now reads like this:
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)

How do I write my Dockerfile to include chromedriver?

I am a newbie to Dockerfile as well as Selenium. I was working on the web scraping using selenium and taking a screenshot. I am trying to dockerize it. This questions of mine seems to be answered in a few questions but it did not solve my error. FYI, I am using a Windows laptop.
The screenshot code works on my local machine but dockerfile seems to be giving me errors.
I am trying to use this version of chromedriver=89.0.4389.82
This is my UPDATED Dockefile,
FROM python:3.6
RUN pip install --upgrade pip && pip install pytest && pip install pytest-mock && pip install pytest-smtp && pip install mock \
pip install schedule && pip install selenium && pip install Selenium-Screenshot && pip install python-dateutil
# For running code
COPY src/screenshotcode.py /
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list
RUN apt-get update -y
RUN apt-get install -y google-chrome-stable
RUN apt-get install libxi6 libgconf-2-4 -y
ENV CHROMEDRIVER_VERSION 2.19
ENV CHROMEDRIVER_DIR /chromedriver
RUN mkdir -p $CHROMEDRIVER_DIR
# Download and install Chromedriver
RUN wget -q --continue -P $CHROMEDRIVER_DIR "http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip"
RUN unzip $CHROMEDRIVER_DIR/chromedriver* -d $CHROMEDRIVER_DIR
# Put Chromedriver into the PATH
ENV PATH $CHROMEDRIVER_DIR:$PATH
CMD [ "python", "screenshotcode.py" ]
My screenshot code,
import time
from Screenshot import Screenshot_Clipping
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from email_it import email_it
from environmental_variables import environmental_variables
from error_alert_email import error_alert_email
from selenium import webdriver
def screenshot():
ob=Screenshot_Clipping.Screenshot()
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--start-fullscreen')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
print('taking screenshot...')
img_url=ob.full_Screenshot(driver, path = path, image_name = label)
print('closing driver...')
driver.close()
screenshot()
EDIT: I get the following error
PS C:\Users\me\Documents\Projects\> docker run screenshot
File "scheduler.py", line 16, in <module>
from screenshot import screenshot
File "/screenshotcode.py", line 72, in <module>
screenshot()
File "/screenshotcode.py", line 32, in screenshot
driver = webdriver.Chrome()
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.19.346067 (6abd8652f8bc7a1d825962003ac88ec6a37a82f1),platform=Linux 5.4.72-microsoft-standard-WSL2 x86_64)
You set in the code the chromedriver to be at:
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
but in your dockerfile you have it at /usr/local/bin/chromedriver
so you need to change your code to
driver = webdriver.Chrome(executable_path = "/usr/local/bin/chromedriver")

None permission for chromedriver.exe in colab

I am trying to run the webdriver resource in the selenium module (python) in Chrome for the google colab. Firstval I have problems to parse the chromedriver.exe file in the command (selenium.webdriver.Chrome('/chromedriver.exe')), overcome that I found the continuos failure of none permission to run the chromedriver.exe, and the version is ok, who knows what possibly is wrong?
WebDriverException: Message: 'chromedriver.exe' executable may have wrong permissions.
You can do it by installing the chromium webdriver and adjusting some options such that it does not crash in google colab:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.webite-url.com")

OSError: [Errno 8] Exec format error: 'chromedriver' using Chromedriver on Ubuntu server

I'm trying to use Chromedriver with Ubuntu (AWS instance). I've gotten Chromedriver to work no problem in a local instance, but having many, many issues doing so in a remote instance.
I'm using the following code:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Chrome(executable_path='/usr/bin/chromedriver', chrome_options=options)
However, I keep getting this error:
Traceback (most recent call last):
File "test.py", line 39, in <module>
driver = webdriver.Chrome()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: 'chromedriver'
I believe I'm using the most updated version of Selenium, Chrome, and Chromedriver.
Chrome version is:Version 78.0.3904.70 (Official Build) (64-bit)
Selenium:
ubuntu#ip-172-31-31-200:/usr/bin$ pip3 show selenium
Name: selenium
Version: 3.141.0
Summary: Python bindings for Selenium
Home-page: https://github.com/SeleniumHQ/selenium/
Author: UNKNOWN
Author-email: UNKNOWN
License: Apache 2.0
Location: /home/ubuntu/.local/lib/python3.6/site-packages
Requires: urllib3
And, finally, for Chromedriver, I'm almost certain I downloaded the most recent version here: https://chromedriver.storage.googleapis.com/index.html?path=78.0.3904.70/. It's the mac_64 version (I'm using Ubuntu on a Mac). I then placed chromedriver in /usr/bin , as I read that's common practice.
I have no idea why this isn't working. A few options I can think of:
some sort of access issue? I'm a beginner with command line and ubuntu - should I be running this as "root" user?
mis-match between Chromedriver and Chrome versions? Is there a way to tell which chromedriver version I have for certain?
I see that Chromedriver and Selenium are in different locations. Selenium is in: Location: /home/ubuntu/.local/lib/python3.6/site-packages and I've moved chromedriver to: /usr/bin . Could this be causing problems?
Ubuntu Server 18.04 LTS (64-bit Arm):
Download Chrome: wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
Install Chrome: sudo dpkg -i google-chrome-stable_current_amd64.deb
If you'll get error run: sudo apt-get -f install
Check Chrome: google-chrome --version
Download chromedriver for Linux: wget https://chromedriver.storage.googleapis.com/78.0.3904.70/chromedriver_linux64.zip
Unzip chromedriver, install unzip sudo apt install unzip if required: unzip chromedriver_linux64.zip
Move chromedriver to /usr/bin: sudo mv chromedriver /usr/bin/chromedriver
Check chromedriver, run command: chromedriver
Install Java: sudo apt install default-jre
Install Selenium: sudo pip3 install selenium
Create test file, nano test.py with content below. Press CTRL+X to exit and the Y to save. Execute your script - python3 test.py
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
try:
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.google.com")
s = driver.find_element_by_name("q")
assert s.is_displayed() is True
print("ok")
except Exception as ex:
print(ex)
driver.quit()
Example of using Docker and selenium/standalone-chrome-debug:
Install docker, installation steps are here
Start container, using sudo docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:3.141.59-xenon command, different options are here
Open Security Group of your instance in AWS and add TCP rule to be able to connect. You can add only your own IP and port 4444 for Selenium
Run test from local
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Remote(command_executor="http://your_instance_ip:4444/wd/hub",
desired_capabilities=options.to_capabilities())
I am running the following on ec2-ubuntu:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options) #Give the full path to chromedriver
Try it. Incase it doesn't work I will find more of the settings.

Categories