headless chrome in docker with python. Chrome failed to start: crashed - python

I want to run this simple script inside a docker container:
def hi_chrome():
from xvfbwrapper import Xvfb
from splinter import Browser
vdisplay = Xvfb()
vdisplay.start()
print "spawning connector"
oBrowser = Browser('chrome')
oBrowser.visit("http://google.co.za")
assert oBrowser.title == "Google"
print "yay"
vdisplay.stop()
if __name__ == '__main__':
hi_chrome()
I've gotten the script to run in a virtual environment by doing all the pip and apt-get installs listed in the my docker file and just running the script. But when I try run it inside a container I get:
Traceback (most recent call last):
File "app.py", line 19, in <module>
hi_chrome()
File "app.py", line 10, in hi_chrome
oBrowser = Browser('chrome')
File "/usr/local/lib/python2.7/dist-packages/splinter/browser.py", line 63, in Browser
return driver(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/chrome.py", line 31, in __init__
self.driver = Chrome(chrome_options=options, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed
(Driver info: chromedriver=2.27.440175 (9bc1d90b8bfa4dd181fbbf769a5eb5e575574320),platform=Linux 4.8.0-34-generic x86_64)
I've had similar problems when trying to run my script using other containers on docker-hub. I've tried using chrome instead of chromium and I've tried using some containers I found on docker-hub but I keep finding broken nonesense. This should be simple.
My main suspicion is that it's a versioning thing. But it works in the venv so that doesnt make too much sense. Or docker just needs something fancy to get the chrome webdriver to run.
Can someone please point out my obvious and noobish mistake?
My Dockerfile looks like
FROM ubuntu:16.04
RUN apt-get update -y && \
apt-get install -y python-pip python-dev xvfb chromium-browser && \
pip install --upgrade pip setuptools
RUN pip install chromedriver_installer
COPY ./requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install -r requirements.txt
COPY . /app
ENTRYPOINT [ "python" ]
CMD [ "app.py" ]
And requirements.txt:
splinter==0.7.5
xvfbwrapper==0.2.8

I found an image that sorta worked and then beat it into submission... Nice thing about this solution is it doesn't need xvfbwrapper so it's nice and simple.
App.py
def hi_chrome():
# from xvfbwrapper import Xvfb
from splinter import Browser
# vdisplay = Xvfb()
# vdisplay.start()
print "spawning connector"
oBrowser = Browser('chrome')
oBrowser.visit("http://google.co.za")
assert oBrowser.title == "Google"
print "yay"
# vdisplay.stop()
if __name__ == '__main__':
hi_chrome()
requirements:
splinter==0.7.5
Dockerfile
FROM markadams/chromium-xvfb
RUN apt-get update && apt-get install -y \
python python-pip curl unzip libgconf-2-4
ENV CHROMEDRIVER_VERSION 2.26
RUN curl -SLO "https://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip" \
&& unzip "chromedriver_linux64.zip" -d /usr/local/bin \
&& rm "chromedriver_linux64.zip"
COPY requirements.txt /usr/src/app/requirements.txt
WORKDIR /usr/src/app
RUN pip install -r requirements.txt
COPY . /usr/src/app
ENTRYPOINT [ "python" ]
CMD [ "app.py" ]

Related

How do you set up a docker container that depends on multiple python libraries being installed?

I am trying to create a docker container to always run mypy in the same environment. The library I want to run mypy on has multiple dependencies, so I have to install those first and have access to them as I am evaluating the library that was passed. This is what it currently looks like, in this example I am only installing scipy as an external dependency, later I would install a regular requirements.txt file instead:
FROM ubuntu:22.04 as builder
RUN apt-get update && apt-get install -y \
bc \
gcc \
musl-dev \
python3-pip \
python3 \
python3-dev
RUN python3.10 -m pip install --no-cache-dir --no-compile scipy && \
python3.10 -m pip install --no-cache-dir --no-compile mypy
FROM ubuntu:22.04 as production
RUN apt-get update && apt-get install -y \
python3 \
COPY --from=builder /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages
COPY --from=builder /usr/local/bin/mypy /usr/local/bin/mypy
WORKDIR /data
ENTRYPOINT ["python3.10", "-m", "mypy"]
I install and run my container with
docker build -t my-package-mypy . && docker run -v $(pwd):/data my-package-mypy main.py
Where main.py is a simple one line script that only imports scipy.
This returns the following output:
main.py:1: error: Cannot find implementation or library stub for module named "scipy" [import]
main.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/mypy/__main__.py", line 37, in <module>
console_entry()
File "/usr/local/lib/python3.10/dist-packages/mypy/__main__.py", line 15, in console_entry
main()
File "mypy/main.py", line 95, in main
File "mypy/main.py", line 174, in run_build
File "mypy/build.py", line 193, in build
File "mypy/build.py", line 302, in _build
File "mypy/build.py", line 3579, in record_missing_stub_packages
PermissionError: [Errno 13] Permission denied: '.mypy_cache/missing_stubs'
Where most importantly, the first line says that it cannot find the installation for scipy even though it was installed alongside mypy. How can I adjust my dockerfile to get it to work as described?

Running a Selenium webscraper in AWS Lambda using Docker

I am trying to create a simple python Lambda app using SAM CLI that gets the number of followers for a particular handle. Have looked at ALL tutorials and blog posts and yet have not been able to make it work.
The build and local test works fine using sam build and sam local invoke, however, after deployment to AWS Lambda it throws the following error.
Any ideas how to solve this?
{
"errorMessage": "Message: unknown error: Chrome failed to start: crashed.\n (chrome not reachable)\n (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)\n",
"errorType": "WebDriverException",
"stackTrace": [
" File \"/var/task/app.py\", line 112, in main\n data = Twitter(StockInfo_list)\n",
" File \"/var/task/app.py\", line 36, in Twitter\n driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_optionsdata)\n",
" File \"/var/task/selenium/webdriver/chrome/webdriver.py\", line 76, in __init__\n RemoteWebDriver.__init__(\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 157, in __init__\n self.start_session(capabilities, browser_profile)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 252, in start_session\n response = self.execute(Command.NEW_SESSION, parameters)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 321, in execute\n self.error_handler.check_response(response)\n",
" File \"/var/task/selenium/webdriver/remote/errorhandler.py\", line 242, in check_response\n raise exception_class(message, screen, stacktrace)\n"
]
}
I'm using the following as my Dockerfile
FROM public.ecr.aws/lambda/python:3.8
# Update repository and install unzip
RUN yum update -y
RUN yum install unzip -y
# Download and install Google Chrome
COPY curl https://intoli.com/install-google-chrome.sh | bash
# Download and install ChromeDriver
RUN CHROME_DRIVER_VERSION=`curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE` && \
wget -O /tmp/chromedriver.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip && \
unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN echo $(chromedriver --version)
# Upgrade PIP
RUN /var/lang/bin/python3.8 -m pip install --upgrade pip
# Install requirements (including selenium)
COPY app.py requirements.txt ./
RUN python3.8 -m pip install -r requirements.txt -t .
# Command can be overwritten by providing a different command in the template directly.
CMD ["app.main"]
My applications looks like
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import os
import time, datetime
import pandas as pd
import csv
# import mysql.connector
from datetime import date, datetime as dt1
def Twitter(twitter_stock_id):
twitterlist = []
stockids = []
twitterlist = twitter_stock_id["TwitterUrl"].str.lower().tolist()
stockids = twitter_stock_id["stockid"].str.lower().tolist()
chrome_optionsdata = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_optionsdata.add_experimental_option("prefs", prefs)
chrome_optionsdata.add_argument("--headless")
chrome_optionsdata.add_argument("--no-sandbox")
chrome_optionsdata.add_argument("--disable-dev-shm-usage")
chrome_optionsdata.add_argument("--disable-gpu")
chrome_optionsdata.add_argument("--disable-gpu-sandbox")
chromedriver_path = "/usr/local/bin/chromedriver"
driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_optionsdata)
lfollowers = []
for url in twitterlist:
tempurl = driver.get(url)
time.sleep(10)
followers = driver.find_elements_by_class_name("r-qvutc0")
for fol in followers:
if "Followers" in fol.text:
tempstr = fol.text.split(" ")
lfollowers.append(tempstr[0])
time.sleep(5)
lmaindata = []
for fld in lfollowers:
if fld != "Followers":
lmaindata.append(fld)
print("Followers" + str(lmaindata))
driver.quit()
return f"Followers: {lmaindata}"
import json
def main(event, context):
StockInfo_list = pd.DataFrame([{"TwitterUrl": "https://twitter.com/costco", "stockid": "COST"}])
data = Twitter(StockInfo_list)
return {
"statusCode": 200,
"body": json.dumps({"message": "hello", "data": data}),
}
There were three key issues why this script didn't work
Lambda restricts write to /tmp/ folder
The executables were not a locaiton in PATH
Missing dependencies for Chromium
To fix this,
I appropriated a shell script that downloads a specific version of Chromium & Chromium webdriver that are compatible into /tmp/ folder and then installed them at /opt/.
#!/usr/bin/bash
declare -A chrome_versions
# Enter the list of browsers to be downloaded
### Using Chromium as documented here - https://www.chromium.org/getting-involved/download-chromium
chrome_versions=( ['89.0.4389.47']='843831' )
chrome_drivers=( "89.0.4389.23" )
#firefox_versions=( "86.0" "87.0b3" )
#gecko_drivers=( "0.29.0" )
# Download Chrome
for br in "${!chrome_versions[#]}"
do
echo "Downloading Chrome version $br"
mkdir -p "/opt/chrome/stable"
curl -Lo "/opt/chrome/stable/chrome-linux.zip" \
"https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F${chrome_versions[$br]}%2Fchrome-linux.zip?alt=media"
unzip -q "/opt/chrome/stable/chrome-linux.zip" -d "/opt/chrome/stable/"
mv /opt/chrome/stable/chrome-linux/* /opt/chrome/stable/
rm -rf /opt/chrome/stable/chrome-linux "/opt/chrome/stable/chrome-linux.zip"
done
# Download Chromedriver
for dr in ${chrome_drivers[#]}
do
echo "Downloading Chromedriver version $dr"
mkdir -p "/opt/chromedriver/stable/"
curl -Lo "/opt/chromedriver/stable//chromedriver_linux64.zip" \
"https://chromedriver.storage.googleapis.com/$dr/chromedriver_linux64.zip"
unzip -q "/opt/chromedriver/stable//chromedriver_linux64.zip" -d "/opt/chromedriver/stable/"
chmod +x "/opt/chromedriver/stable/chromedriver"
rm -rf "/opt/chromedriver/stable/chromedriver_linux64.zip"
done
echo "Chrome & Chromedriver installed"
Changed the Dockerfile to the following
FROM public.ecr.aws/lambda/python:3.8 as base
# Hack to install chromium dependencies
RUN yum install -y -q unzip
RUN yum install -y https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
# Install Chromium
COPY install-browser.sh /tmp/
RUN /usr/bin/bash /tmp/install-browser.sh
#FROM public.ecr.aws/lambda/python:3.8
# Install Python dependencies for function
COPY requirements.txt /tmp/
RUN pip install --upgrade pip -q
RUN pip install -r /tmp/requirements.txt -q
COPY app.py ./
CMD [ "app.handler" ]
And finally, the missing dependencies were also solved within this docker file by installing Chrome directly that brought with itself 122 packages that were needed to run Chrome.
I've put this in a GitHub Repository here and explained the steps in a blog post here.

How do I write my Dockerfile to include chromedriver?

I am a newbie to Dockerfile as well as Selenium. I was working on the web scraping using selenium and taking a screenshot. I am trying to dockerize it. This questions of mine seems to be answered in a few questions but it did not solve my error. FYI, I am using a Windows laptop.
The screenshot code works on my local machine but dockerfile seems to be giving me errors.
I am trying to use this version of chromedriver=89.0.4389.82
This is my UPDATED Dockefile,
FROM python:3.6
RUN pip install --upgrade pip && pip install pytest && pip install pytest-mock && pip install pytest-smtp && pip install mock \
pip install schedule && pip install selenium && pip install Selenium-Screenshot && pip install python-dateutil
# For running code
COPY src/screenshotcode.py /
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list
RUN apt-get update -y
RUN apt-get install -y google-chrome-stable
RUN apt-get install libxi6 libgconf-2-4 -y
ENV CHROMEDRIVER_VERSION 2.19
ENV CHROMEDRIVER_DIR /chromedriver
RUN mkdir -p $CHROMEDRIVER_DIR
# Download and install Chromedriver
RUN wget -q --continue -P $CHROMEDRIVER_DIR "http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip"
RUN unzip $CHROMEDRIVER_DIR/chromedriver* -d $CHROMEDRIVER_DIR
# Put Chromedriver into the PATH
ENV PATH $CHROMEDRIVER_DIR:$PATH
CMD [ "python", "screenshotcode.py" ]
My screenshot code,
import time
from Screenshot import Screenshot_Clipping
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from email_it import email_it
from environmental_variables import environmental_variables
from error_alert_email import error_alert_email
from selenium import webdriver
def screenshot():
ob=Screenshot_Clipping.Screenshot()
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--start-fullscreen')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
print('taking screenshot...')
img_url=ob.full_Screenshot(driver, path = path, image_name = label)
print('closing driver...')
driver.close()
screenshot()
EDIT: I get the following error
PS C:\Users\me\Documents\Projects\> docker run screenshot
File "scheduler.py", line 16, in <module>
from screenshot import screenshot
File "/screenshotcode.py", line 72, in <module>
screenshot()
File "/screenshotcode.py", line 32, in screenshot
driver = webdriver.Chrome()
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.19.346067 (6abd8652f8bc7a1d825962003ac88ec6a37a82f1),platform=Linux 5.4.72-microsoft-standard-WSL2 x86_64)
You set in the code the chromedriver to be at:
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
but in your dockerfile you have it at /usr/local/bin/chromedriver
so you need to change your code to
driver = webdriver.Chrome(executable_path = "/usr/local/bin/chromedriver")

Unable to find ssl cert or key file in docker build

I have a docker file that generates a flask app using gunicorn. For my purposes I need to use https so I'm setting up ssl using openssl. However I keep running into this error:
[2020-02-24 17:01:18 +0000] [1] [INFO] Starting gunicorn 20.0.4
Traceback (most recent call last):
File "/usr/local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/usr/local/lib/python3.6/dist-packages/gunicorn/app/wsgiapp.py", line 58, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/usr/local/lib/python3.6/dist-packages/gunicorn/app/base.py", line 228, in run
super().run()
File "/usr/local/lib/python3.6/dist-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/usr/local/lib/python3.6/dist-packages/gunicorn/arbiter.py", line 198, in run
self.start()
File "/usr/local/lib/python3.6/dist-packages/gunicorn/arbiter.py", line 155, in start
self.LISTENERS = sock.create_sockets(self.cfg, self.log, fds)
File "/usr/local/lib/python3.6/dist-packages/gunicorn/sock.py", line 162, in create_sockets
raise ValueError('certfile "%s" does not exist' % conf.certfile)
ValueError: certfile "server.crt" does not exist
Here is my Dockerfile:
FROM ubuntu:latest
RUN apt-get update && apt-get install python3-pip -y && \
apt-get install python3-dev openssl
RUN openssl req -nodes -new -x509 -keyout server.key -out server.cert -subj "/C=US/ST=MD/L=Columbia/O=Example/OU=ExampleOU/CN=example.com/emailAddress=seanbrhn3#gmail.com"
COPY ./requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip3 install -r requirements.txt
COPY . /app
ENV PORT 8080
CMD ["gunicorn", "--certfile=server.crt","--keyfile=server.key","app:app", "--config=config.py"]
All help is most appreciated!
I solved the problem instead of creating the cert and key in the dockerfile I used the copy command to take my cert and keys from my local directory and put it in the dockerfile. My problem was just my lack of knowledge of docker
This probably won't be the case for most people, but I wanted to put this out there just in case. I had this issue and realized that I had forgotten that I was mounting a volume in my docker run command.
So even though I was downloading files in my Dockerfile (during docker build), as soon as I did docker run with the volume getting mounted (the volume did not contain those downloaded files) all the downloaded files would get deleted.

Pass arguments to scrapy spider through docker run

I have a scrapy+Selenium spider packaged in a docker container. I want to run that container with passing some aruments to the spider. However, for some reason I receive a strange error message. I did an extensive search and tried many different options before submitting the question.
Dockerfile
FROM python:2.7
# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# install xvfb
RUN apt-get install -yqq xvfb
# install pyvirtualdisplay
RUN pip install pyvirtualdisplay
# set display port and dbus env to avoid hanging
ENV DISPLAY=:99
ENV DBUS_SESSION_BUS_ADDRESS=/dev/null
#install scrapy
RUN pip install --upgrade pip && \
pip install --upgrade \
setuptools \
wheel && \
pip install --upgrade scrapy
# install selenium
RUN pip install selenium==3.8.0
# install xlrd
RUN pip install xlrd
# install bs4
RUN pip install beautifulsoup4
ADD . /tralala/
WORKDIR tralala/
CMD scrapy crawl personel_spider_mpc -a chunksNo=$chunksNo -a chunkI=$chunkI
I guess that the problem may be in CMD part.
Spider init part:
class Crawler(scrapy.Spider):
name = "personel_spider_mpc"
allowed_domains = ['tralala.de',]
def __init__(self, vdisplay = True, **kwargs):
super(Crawler, self).__init__(**kwargs)
self.chunkI = chunkI
self.chunksNo = chunksNo
How I run the container:
docker run --env chunksNo='10' --env chunkI='1' ostapp/tralala
I tried with both quotations marks and without them
The error message:
2018-04-04 16:42:32 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
File "/tralala/tralala/spiders/tralala_spider_mpc.py", line 673, in __init__
self.chunkI = chunkI
NameError: global name 'chunkI' is not defined
Your arguments are stored in kwargs, which is just a dictionary, with key acting as argument name and value as argument value. It does not define names for you, so you get your error.
For more details, see this answer
In your specific case, try self.chunkI = kwargs['chunkI'] and self.chunksNo = kwargs['chunksNo']

Categories