How to run Chrome Headless in Docker Container with Selenium? - python

I am trying to run a simple test file that is meant to open google.com on chrome within an openjdk docker container and return "Completely Successfully" upon completion, however, I keep receiving the same error saying that the "service object has no attribute process". This is the error I keep receiving:
Traceback (most recent call last):
File "/NewJersey/test.py", line 60, in <module>
print(main())
^^^^^^
File "/NewJersey/test.py", line 42, in main
driver = webdriver.Chrome(service = service, options=chrome_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
super().__init__(
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/chromium/webdriver.py", line 103, in __init__
self.service.start()
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/common/service.py", line 106, in start
self.assert_process_still_running()
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/common/service.py", line 117, in assert_process_still_running
return_code = self.process.poll()
^^^^^^^^^^^^
AttributeError: 'Service' object has no attribute 'process'
This is the code I am running:
#General Imports
from logging import error
import os
import sys
import time
import os.path
import random
#Selenium Imports (Chrome)
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
#ChromeDriver Import
from webdriver_manager.chrome import ChromeDriverManager
def main():
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
service = ChromeService("/chromedriver")
driver = webdriver.Chrome(service = service, options=chrome_options)
try:
completion_msg = reroute(driver)
print(completion_msg)
driver.close()
return "Test Completed Successfully"
except error as Error:
return Error
def reroute(driver):
driver.get("https://www.google.com")
return "Success"
if __name__ == "__main__":
print(main())
This is my docker container:
# syntax=docker/dockerfile:1
FROM openjdk:11
ENV PATH = "${PATH}:/chromedriver/chromedriver.exe"
RUN apt-get update && apt-get install -y \
software-properties-common \
unzip \
curl \
xvfb \
wget \
bzip2 \
snapd
# Chrome
RUN apt-get update && \
apt-get install -y gnupg wget curl unzip --no-install-recommends && \
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list && \
apt-get update -y && \
apt-get install -y google-chrome-stable && \
CHROMEVER=$(google-chrome --product-version | grep -o "[^\.]*\.[^\.]*\.[^\.]*") && \
DRIVERVER=$(curl -s "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_$CHROMEVER") && \
wget -q --continue -P /chromedriver "http://chromedriver.storage.googleapis.com/$DRIVERVER/chromedriver_linux64.zip" && \
unzip /chromedriver/chromedriver* -d /chromedriver
# Python
RUN apt-get update && apt-get install -y \
python2.7 \
python-setuptools \
python3-pip
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
CMD python3 test.py
When I first started my project, I attempted to do it with firefox but due to certain limitations chose to switch to chrome.
After trying to do research, there were suggestions to pass the path of chromedriver to the service object and add the path of chromedriver to the PATH in the docker container, both of which I have already done as shown above. I continue to get the exact same error.
I haven't been able to find any other solutions to the above issue so I would greatly appreciate any help!

In case anyone else stumbles across this and has a similar problem, this is how I solved it.
I simply removed the service object entirely. It seems that for whatever reason, the service object wasn't configured correctly or even needed once I had added the ChromeDriver path to my System Path on the dockerfile. The code snippet now reads like this:
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)

Related

Selenium Chromedriver not working with Google Colab Anymore on Python 3.8.16?

I have been using selenium chromedriver in google colab for the past year and it seems to be working perfectly.
But last week, the script doesnt seem to be working anymore. I looked at the python version of google colab and it's now on python 3.8.16 which I think is the culprit of this code breaking.
I use the code:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install -y chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver= webdriver.Chrome('chromedriver',options=chrome_options)`
And now in this line:
driver= webdriver.Chrome('chromedriver',options=chrome_options)
I get an error saying:
WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: 1
Anyone already found a fix for this?
Seems to be a problem with chromedriver itsself, and not python.
Also, I couldn't reproduce the error.
The execution seems to be fine for me.
You might check, if there's something else in your code which might break it and you didn't provide here.
Run This Codes. For me it Worked
%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap
# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF
# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg
# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500
Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300
Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF
Then Run This
!apt-get update
!apt-get install chromium chromium-driver
!pip3 install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = "http://example.com/"
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
driver = webdriver.Chrome("chromedriver", options=options)
driver.get(url)
print(driver.title)
driver.quit()

runnig undetected_chromdriver on AWS Lambda: Message: 'e39098076c0be4f2_chromedriver' executable needs to be in PATH

I'm trying to deploy a python automation script on AWS lambda using docker image. and I'm sure I did everything right regarding the Path and the installtion, but when I run it on AWS I get this weird error message.
"errorMessage": "Message: 'e39098076c0be4f2_chromedriver' executable needs to be in PATH."
what is wierd about it is that there is always a random string infront of chromedriver. the usual error message should be like:
"errorMessage": "Message: 'chromedriver' executable needs to be in PATH."
the complete log for the error I get:
{
"errorMessage": "Message: 'e39098076c0be4f2_chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home\n",
"errorType": "WebDriverException",
"requestId": "107a17da-65d4-4bec-a5dc-1e69c1b63fe7",
"stackTrace": [
" File \"/home/app/app.py\", line 60, in lambda_handler\n driver = uc.Chrome(executable_path=r\"/tmp/chromedriver\")\n",
" File \"/home/app/undetected_chromedriver/__init__.py\", line 409, in __init__\n super(Chrome, self).__init__(\n",
" File \"/home/app/selenium/webdriver/chrome/webdriver.py\", line 69, in __init__\n super().__init__(DesiredCapabilities.CHROME['browserName'], \"goog\",\n",
" File \"/home/app/selenium/webdriver/chromium/webdriver.py\", line 89, in __init__\n self.service.start()\n",
" File \"/home/app/selenium/webdriver/common/service.py\", line 81, in start\n raise WebDriverException(\n"
]
}
I have the executble file in Path but it's name is chromdriver.
my docker file:
# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.10"
ARG DISTRO_VERSION="3.16"
# Stage 1 - bundle base image + runtime
# Grab a fresh copy of the image and install GCC
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine
# Install GCC (Alpine uses musl but we compile and link dependencies with GCC)
RUN apk add --no-cache \
libstdc++
# Stage 2 - build function and dependencies
FROM python-alpine AS build-image
# Install aws-lambda-cpp build dependencies
RUN apk add --no-cache \
build-base \
libtool \
autoconf \
automake \
libexecinfo-dev \
make \
cmake \
libcurl \
curl \
gcc \
g++
# Include global args in this stage of the build
ARG FUNCTION_DIR
ARG RUNTIME_VERSION
# Create function directory
RUN mkdir -p ${FUNCTION_DIR}
# Copy required files
COPY patcher.py ${FUNCTION_DIR}
COPY app.py ${FUNCTION_DIR}
COPY requirements.txt .
COPY edit_excutable.py ${FUNCTION_DIR}
# Optional – Install the function's dependencies
RUN python${RUNTIME_VERSION} -m pip install --upgrade pip
RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
# Fix undetected_chromedriver to use in lambda
RUN cd ${FUNCTION_DIR} && cp -f patcher.py ${FUNCTION_DIR}/undetected_chromedriver
# Install Lambda Runtime Interface Client for Python
RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}
# Stage 3 - final runtime image
# Grab a fresh copy of the Python image
FROM python-alpine
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
COPY edit_excutable.py /usr/bin/
RUN apk add chromium-chromedriver
RUN wget https://chromedriver.storage.googleapis.com/107.0.5304.62/chromedriver_linux64.zip
#RUN python${RUNTIME_VERSION} edit_excutable.py
RUN cp /usr/bin/chromedriver ${FUNCTION_DIR}
# (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
COPY entry.sh /
RUN chmod 755 /usr/bin/aws-lambda-rie /entry.sh
ENTRYPOINT [ "/entry.sh" ]
CMD [ "app.lambda_handler" ]
and here is a snippet of my code that generate the error:
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver.v2 as uc
import subprocess
import shutil
import time
BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd()
def lambda_handler(event, context):
if not os.path.exists(BIN_DIR):
print("Creating bin folder")
os.makedirs(BIN_DIR)
os.environ["PATH"] += os.pathsep + BIN_DIR
os.environ["PATH"] += os.pathsep + CURR_BIN_DIR
print (os.environ)
chrome_options = uc.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--window-size=1024x768')
chrome_options.add_argument('--user-data-dir=/tmp/user-data')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--data-path=/tmp/data-path')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--homedir=/tmp')
chrome_options.add_argument('--disk-cache-dir=/tmp/cache-dir')
chrome_options.binary_location = "/tmp/chromedriver"
options = {'request_storage_base_dir': '/tmp' }
os.system("cp ./chromedriver /tmp/chromedriver")
os.chmod("/tmp/chromedriver", 0o777)
driver = uc.Chrome(executable_path=r"/tmp/chromedriver", chrome_options=chrome_options)
I tried different versions of the runtime, chromdrive and distro version.
I solved the problem in the code using driver_executable_path argument instead of executable_path.
also changed the docker file to install chromium browser then chromdrive
RUN apk add chromium
RUN apk add chromium-chromedriver

Running a Selenium webscraper in AWS Lambda using Docker

I am trying to create a simple python Lambda app using SAM CLI that gets the number of followers for a particular handle. Have looked at ALL tutorials and blog posts and yet have not been able to make it work.
The build and local test works fine using sam build and sam local invoke, however, after deployment to AWS Lambda it throws the following error.
Any ideas how to solve this?
{
"errorMessage": "Message: unknown error: Chrome failed to start: crashed.\n (chrome not reachable)\n (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)\n",
"errorType": "WebDriverException",
"stackTrace": [
" File \"/var/task/app.py\", line 112, in main\n data = Twitter(StockInfo_list)\n",
" File \"/var/task/app.py\", line 36, in Twitter\n driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_optionsdata)\n",
" File \"/var/task/selenium/webdriver/chrome/webdriver.py\", line 76, in __init__\n RemoteWebDriver.__init__(\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 157, in __init__\n self.start_session(capabilities, browser_profile)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 252, in start_session\n response = self.execute(Command.NEW_SESSION, parameters)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 321, in execute\n self.error_handler.check_response(response)\n",
" File \"/var/task/selenium/webdriver/remote/errorhandler.py\", line 242, in check_response\n raise exception_class(message, screen, stacktrace)\n"
]
}
I'm using the following as my Dockerfile
FROM public.ecr.aws/lambda/python:3.8
# Update repository and install unzip
RUN yum update -y
RUN yum install unzip -y
# Download and install Google Chrome
COPY curl https://intoli.com/install-google-chrome.sh | bash
# Download and install ChromeDriver
RUN CHROME_DRIVER_VERSION=`curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE` && \
wget -O /tmp/chromedriver.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip && \
unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN echo $(chromedriver --version)
# Upgrade PIP
RUN /var/lang/bin/python3.8 -m pip install --upgrade pip
# Install requirements (including selenium)
COPY app.py requirements.txt ./
RUN python3.8 -m pip install -r requirements.txt -t .
# Command can be overwritten by providing a different command in the template directly.
CMD ["app.main"]
My applications looks like
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import os
import time, datetime
import pandas as pd
import csv
# import mysql.connector
from datetime import date, datetime as dt1
def Twitter(twitter_stock_id):
twitterlist = []
stockids = []
twitterlist = twitter_stock_id["TwitterUrl"].str.lower().tolist()
stockids = twitter_stock_id["stockid"].str.lower().tolist()
chrome_optionsdata = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_optionsdata.add_experimental_option("prefs", prefs)
chrome_optionsdata.add_argument("--headless")
chrome_optionsdata.add_argument("--no-sandbox")
chrome_optionsdata.add_argument("--disable-dev-shm-usage")
chrome_optionsdata.add_argument("--disable-gpu")
chrome_optionsdata.add_argument("--disable-gpu-sandbox")
chromedriver_path = "/usr/local/bin/chromedriver"
driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_optionsdata)
lfollowers = []
for url in twitterlist:
tempurl = driver.get(url)
time.sleep(10)
followers = driver.find_elements_by_class_name("r-qvutc0")
for fol in followers:
if "Followers" in fol.text:
tempstr = fol.text.split(" ")
lfollowers.append(tempstr[0])
time.sleep(5)
lmaindata = []
for fld in lfollowers:
if fld != "Followers":
lmaindata.append(fld)
print("Followers" + str(lmaindata))
driver.quit()
return f"Followers: {lmaindata}"
import json
def main(event, context):
StockInfo_list = pd.DataFrame([{"TwitterUrl": "https://twitter.com/costco", "stockid": "COST"}])
data = Twitter(StockInfo_list)
return {
"statusCode": 200,
"body": json.dumps({"message": "hello", "data": data}),
}
There were three key issues why this script didn't work
Lambda restricts write to /tmp/ folder
The executables were not a locaiton in PATH
Missing dependencies for Chromium
To fix this,
I appropriated a shell script that downloads a specific version of Chromium & Chromium webdriver that are compatible into /tmp/ folder and then installed them at /opt/.
#!/usr/bin/bash
declare -A chrome_versions
# Enter the list of browsers to be downloaded
### Using Chromium as documented here - https://www.chromium.org/getting-involved/download-chromium
chrome_versions=( ['89.0.4389.47']='843831' )
chrome_drivers=( "89.0.4389.23" )
#firefox_versions=( "86.0" "87.0b3" )
#gecko_drivers=( "0.29.0" )
# Download Chrome
for br in "${!chrome_versions[#]}"
do
echo "Downloading Chrome version $br"
mkdir -p "/opt/chrome/stable"
curl -Lo "/opt/chrome/stable/chrome-linux.zip" \
"https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F${chrome_versions[$br]}%2Fchrome-linux.zip?alt=media"
unzip -q "/opt/chrome/stable/chrome-linux.zip" -d "/opt/chrome/stable/"
mv /opt/chrome/stable/chrome-linux/* /opt/chrome/stable/
rm -rf /opt/chrome/stable/chrome-linux "/opt/chrome/stable/chrome-linux.zip"
done
# Download Chromedriver
for dr in ${chrome_drivers[#]}
do
echo "Downloading Chromedriver version $dr"
mkdir -p "/opt/chromedriver/stable/"
curl -Lo "/opt/chromedriver/stable//chromedriver_linux64.zip" \
"https://chromedriver.storage.googleapis.com/$dr/chromedriver_linux64.zip"
unzip -q "/opt/chromedriver/stable//chromedriver_linux64.zip" -d "/opt/chromedriver/stable/"
chmod +x "/opt/chromedriver/stable/chromedriver"
rm -rf "/opt/chromedriver/stable/chromedriver_linux64.zip"
done
echo "Chrome & Chromedriver installed"
Changed the Dockerfile to the following
FROM public.ecr.aws/lambda/python:3.8 as base
# Hack to install chromium dependencies
RUN yum install -y -q unzip
RUN yum install -y https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
# Install Chromium
COPY install-browser.sh /tmp/
RUN /usr/bin/bash /tmp/install-browser.sh
#FROM public.ecr.aws/lambda/python:3.8
# Install Python dependencies for function
COPY requirements.txt /tmp/
RUN pip install --upgrade pip -q
RUN pip install -r /tmp/requirements.txt -q
COPY app.py ./
CMD [ "app.handler" ]
And finally, the missing dependencies were also solved within this docker file by installing Chrome directly that brought with itself 122 packages that were needed to run Chrome.
I've put this in a GitHub Repository here and explained the steps in a blog post here.

How do I write my Dockerfile to include chromedriver?

I am a newbie to Dockerfile as well as Selenium. I was working on the web scraping using selenium and taking a screenshot. I am trying to dockerize it. This questions of mine seems to be answered in a few questions but it did not solve my error. FYI, I am using a Windows laptop.
The screenshot code works on my local machine but dockerfile seems to be giving me errors.
I am trying to use this version of chromedriver=89.0.4389.82
This is my UPDATED Dockefile,
FROM python:3.6
RUN pip install --upgrade pip && pip install pytest && pip install pytest-mock && pip install pytest-smtp && pip install mock \
pip install schedule && pip install selenium && pip install Selenium-Screenshot && pip install python-dateutil
# For running code
COPY src/screenshotcode.py /
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list
RUN apt-get update -y
RUN apt-get install -y google-chrome-stable
RUN apt-get install libxi6 libgconf-2-4 -y
ENV CHROMEDRIVER_VERSION 2.19
ENV CHROMEDRIVER_DIR /chromedriver
RUN mkdir -p $CHROMEDRIVER_DIR
# Download and install Chromedriver
RUN wget -q --continue -P $CHROMEDRIVER_DIR "http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip"
RUN unzip $CHROMEDRIVER_DIR/chromedriver* -d $CHROMEDRIVER_DIR
# Put Chromedriver into the PATH
ENV PATH $CHROMEDRIVER_DIR:$PATH
CMD [ "python", "screenshotcode.py" ]
My screenshot code,
import time
from Screenshot import Screenshot_Clipping
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from email_it import email_it
from environmental_variables import environmental_variables
from error_alert_email import error_alert_email
from selenium import webdriver
def screenshot():
ob=Screenshot_Clipping.Screenshot()
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--start-fullscreen')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
print('taking screenshot...')
img_url=ob.full_Screenshot(driver, path = path, image_name = label)
print('closing driver...')
driver.close()
screenshot()
EDIT: I get the following error
PS C:\Users\me\Documents\Projects\> docker run screenshot
File "scheduler.py", line 16, in <module>
from screenshot import screenshot
File "/screenshotcode.py", line 72, in <module>
screenshot()
File "/screenshotcode.py", line 32, in screenshot
driver = webdriver.Chrome()
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.19.346067 (6abd8652f8bc7a1d825962003ac88ec6a37a82f1),platform=Linux 5.4.72-microsoft-standard-WSL2 x86_64)
You set in the code the chromedriver to be at:
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
but in your dockerfile you have it at /usr/local/bin/chromedriver
so you need to change your code to
driver = webdriver.Chrome(executable_path = "/usr/local/bin/chromedriver")

Docker container returns an error with selenium webdriver

I've been struggling on this issue for a while now. I am trying to build a Docker container that scrape some data with selenium Webdriver and I got an error saying the driver is no callable. Check:
> [stage-1 6/6] RUN python db_starter.py:
#10 35.99 Traceback (most recent call last):
#10 35.99 File "db_starter.py", line 3, in <module>
#10 35.99 run_backend.update_db()
#10 35.99 File "/app/run_backend.py", line 11, in update_db
#10 35.99 search_page = donwload_search_page(query, page)
#10 35.99 File "/app/get_data.py", line 19, in donwload_search_page
#10 35.99 soup = BeautifulSoup(html, 'html.parser')
#10 35.99 TypeError: 'module' object is not callable
Here is my Dockerfile, I tried either with Chrome and Firefox and the error is the same:
FROM scrapinghub/scrapinghub-stack-scrapy:1.3
from python:3.7-slim
COPY . /app
WORKDIR /app
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates curl firefox-esr \
&& rm -fr /var/lib/apt/lists/* \
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz | tar xz -C /usr/local/bin \
&& apt-get purge -y ca-certificates curl
RUN pip install --no-cache-dir -r requirements.txt
RUN python db_starter.py
And here is where the code is crashing:
import requests as rq
import bs4 as BeautifulSoup
import time
import os
from selenium import webdriver
def donwload_search_page(query, page):
options = webdriver.FirefoxOptions()
options.add_argument("--window-size 1920,1080")
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)
url = "https://www.amazon.com/s?k={query}&page={page}".format(query = query, page = page)
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
driver.close()
time.sleep(2)
return soup.text
I really don't get why it says the module is not callable, I ran the code in my machine, in a jupyter notebook with geckodriver in the folder and it works, when used to try to build a container, it returns this error.
Can any of you help me on this one?
Thank you!
I found the error. It was a beginner's mistake.
import bs4 as Beautifulsoup.
Should have been
from bs4 import BeautifulSoup.
Thanks those that checked.

Categories