Pass arguments to scrapy spider through docker run

Pass arguments to scrapy spider through docker run - python

I have a scrapy+Selenium spider packaged in a docker container. I want to run that container with passing some aruments to the spider. However, for some reason I receive a strange error message. I did an extensive search and tried many different options before submitting the question.
Dockerfile
FROM python:2.7
# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# install xvfb
RUN apt-get install -yqq xvfb
# install pyvirtualdisplay
RUN pip install pyvirtualdisplay
# set display port and dbus env to avoid hanging
ENV DISPLAY=:99
ENV DBUS_SESSION_BUS_ADDRESS=/dev/null
#install scrapy
RUN pip install --upgrade pip && \
pip install --upgrade \
setuptools \
wheel && \
pip install --upgrade scrapy
# install selenium
RUN pip install selenium==3.8.0
# install xlrd
RUN pip install xlrd
# install bs4
RUN pip install beautifulsoup4
ADD . /tralala/
WORKDIR tralala/
CMD scrapy crawl personel_spider_mpc -a chunksNo=$chunksNo -a chunkI=$chunkI
I guess that the problem may be in CMD part.
Spider init part:
class Crawler(scrapy.Spider):
name = "personel_spider_mpc"
allowed_domains = ['tralala.de',]
def __init__(self, vdisplay = True, **kwargs):
super(Crawler, self).__init__(**kwargs)
self.chunkI = chunkI
self.chunksNo = chunksNo
How I run the container:
docker run --env chunksNo='10' --env chunkI='1' ostapp/tralala
I tried with both quotations marks and without them
The error message:
2018-04-04 16:42:32 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
File "/tralala/tralala/spiders/tralala_spider_mpc.py", line 673, in __init__
self.chunkI = chunkI
NameError: global name 'chunkI' is not defined

Your arguments are stored in kwargs, which is just a dictionary, with key acting as argument name and value as argument value. It does not define names for you, so you get your error.
For more details, see this answer
In your specific case, try self.chunkI = kwargs['chunkI'] and self.chunksNo = kwargs['chunksNo']

Related

How to run Chrome Headless in Docker Container with Selenium?

I am trying to run a simple test file that is meant to open google.com on chrome within an openjdk docker container and return "Completely Successfully" upon completion, however, I keep receiving the same error saying that the "service object has no attribute process". This is the error I keep receiving:
Traceback (most recent call last):
File "/NewJersey/test.py", line 60, in <module>
print(main())
^^^^^^
File "/NewJersey/test.py", line 42, in main
driver = webdriver.Chrome(service = service, options=chrome_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
super().__init__(
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/chromium/webdriver.py", line 103, in __init__
self.service.start()
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/common/service.py", line 106, in start
self.assert_process_still_running()
File "/usr/local/lib/python3.11/dist-packages/selenium/webdriver/common/service.py", line 117, in assert_process_still_running
return_code = self.process.poll()
^^^^^^^^^^^^
AttributeError: 'Service' object has no attribute 'process'
This is the code I am running:
#General Imports
from logging import error
import os
import sys
import time
import os.path
import random
#Selenium Imports (Chrome)
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
#ChromeDriver Import
from webdriver_manager.chrome import ChromeDriverManager
def main():
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
service = ChromeService("/chromedriver")
driver = webdriver.Chrome(service = service, options=chrome_options)
try:
completion_msg = reroute(driver)
print(completion_msg)
driver.close()
return "Test Completed Successfully"
except error as Error:
return Error
def reroute(driver):
driver.get("https://www.google.com")
return "Success"
if __name__ == "__main__":
print(main())
This is my docker container:
# syntax=docker/dockerfile:1
FROM openjdk:11
ENV PATH = "${PATH}:/chromedriver/chromedriver.exe"
RUN apt-get update && apt-get install -y \
software-properties-common \
unzip \
curl \
xvfb \
wget \
bzip2 \
snapd
# Chrome
RUN apt-get update && \
apt-get install -y gnupg wget curl unzip --no-install-recommends && \
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list && \
apt-get update -y && \
apt-get install -y google-chrome-stable && \
CHROMEVER=$(google-chrome --product-version | grep -o "[^\.]*\.[^\.]*\.[^\.]*") && \
DRIVERVER=$(curl -s "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_$CHROMEVER") && \
wget -q --continue -P /chromedriver "http://chromedriver.storage.googleapis.com/$DRIVERVER/chromedriver_linux64.zip" && \
unzip /chromedriver/chromedriver* -d /chromedriver
# Python
RUN apt-get update && apt-get install -y \
python2.7 \
python-setuptools \
python3-pip
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
CMD python3 test.py
When I first started my project, I attempted to do it with firefox but due to certain limitations chose to switch to chrome.
After trying to do research, there were suggestions to pass the path of chromedriver to the service object and add the path of chromedriver to the PATH in the docker container, both of which I have already done as shown above. I continue to get the exact same error.
I haven't been able to find any other solutions to the above issue so I would greatly appreciate any help!

In case anyone else stumbles across this and has a similar problem, this is how I solved it.
I simply removed the service object entirely. It seems that for whatever reason, the service object wasn't configured correctly or even needed once I had added the ChromeDriver path to my System Path on the dockerfile. The code snippet now reads like this:
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)

Running a Selenium webscraper in AWS Lambda using Docker

I am trying to create a simple python Lambda app using SAM CLI that gets the number of followers for a particular handle. Have looked at ALL tutorials and blog posts and yet have not been able to make it work.
The build and local test works fine using sam build and sam local invoke, however, after deployment to AWS Lambda it throws the following error.
Any ideas how to solve this?
{
"errorMessage": "Message: unknown error: Chrome failed to start: crashed.\n (chrome not reachable)\n (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)\n",
"errorType": "WebDriverException",
"stackTrace": [
" File \"/var/task/app.py\", line 112, in main\n data = Twitter(StockInfo_list)\n",
" File \"/var/task/app.py\", line 36, in Twitter\n driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_optionsdata)\n",
" File \"/var/task/selenium/webdriver/chrome/webdriver.py\", line 76, in __init__\n RemoteWebDriver.__init__(\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 157, in __init__\n self.start_session(capabilities, browser_profile)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 252, in start_session\n response = self.execute(Command.NEW_SESSION, parameters)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 321, in execute\n self.error_handler.check_response(response)\n",
" File \"/var/task/selenium/webdriver/remote/errorhandler.py\", line 242, in check_response\n raise exception_class(message, screen, stacktrace)\n"
]
}
I'm using the following as my Dockerfile
FROM public.ecr.aws/lambda/python:3.8
# Update repository and install unzip
RUN yum update -y
RUN yum install unzip -y
# Download and install Google Chrome
COPY curl https://intoli.com/install-google-chrome.sh | bash
# Download and install ChromeDriver
RUN CHROME_DRIVER_VERSION=`curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE` && \
wget -O /tmp/chromedriver.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip && \
unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN echo $(chromedriver --version)
# Upgrade PIP
RUN /var/lang/bin/python3.8 -m pip install --upgrade pip
# Install requirements (including selenium)
COPY app.py requirements.txt ./
RUN python3.8 -m pip install -r requirements.txt -t .
# Command can be overwritten by providing a different command in the template directly.
CMD ["app.main"]
My applications looks like
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import os
import time, datetime
import pandas as pd
import csv
# import mysql.connector
from datetime import date, datetime as dt1
def Twitter(twitter_stock_id):
twitterlist = []
stockids = []
twitterlist = twitter_stock_id["TwitterUrl"].str.lower().tolist()
stockids = twitter_stock_id["stockid"].str.lower().tolist()
chrome_optionsdata = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_optionsdata.add_experimental_option("prefs", prefs)
chrome_optionsdata.add_argument("--headless")
chrome_optionsdata.add_argument("--no-sandbox")
chrome_optionsdata.add_argument("--disable-dev-shm-usage")
chrome_optionsdata.add_argument("--disable-gpu")
chrome_optionsdata.add_argument("--disable-gpu-sandbox")
chromedriver_path = "/usr/local/bin/chromedriver"
driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_optionsdata)
lfollowers = []
for url in twitterlist:
tempurl = driver.get(url)
time.sleep(10)
followers = driver.find_elements_by_class_name("r-qvutc0")
for fol in followers:
if "Followers" in fol.text:
tempstr = fol.text.split(" ")
lfollowers.append(tempstr[0])
time.sleep(5)
lmaindata = []
for fld in lfollowers:
if fld != "Followers":
lmaindata.append(fld)
print("Followers" + str(lmaindata))
driver.quit()
return f"Followers: {lmaindata}"
import json
def main(event, context):
StockInfo_list = pd.DataFrame([{"TwitterUrl": "https://twitter.com/costco", "stockid": "COST"}])
data = Twitter(StockInfo_list)
return {
"statusCode": 200,
"body": json.dumps({"message": "hello", "data": data}),
}

There were three key issues why this script didn't work
Lambda restricts write to /tmp/ folder
The executables were not a locaiton in PATH
Missing dependencies for Chromium
To fix this,
I appropriated a shell script that downloads a specific version of Chromium & Chromium webdriver that are compatible into /tmp/ folder and then installed them at /opt/.
#!/usr/bin/bash
declare -A chrome_versions
# Enter the list of browsers to be downloaded
### Using Chromium as documented here - https://www.chromium.org/getting-involved/download-chromium
chrome_versions=( ['89.0.4389.47']='843831' )
chrome_drivers=( "89.0.4389.23" )
#firefox_versions=( "86.0" "87.0b3" )
#gecko_drivers=( "0.29.0" )
# Download Chrome
for br in "${!chrome_versions[#]}"
do
echo "Downloading Chrome version $br"
mkdir -p "/opt/chrome/stable"
curl -Lo "/opt/chrome/stable/chrome-linux.zip" \
"https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F${chrome_versions[$br]}%2Fchrome-linux.zip?alt=media"
unzip -q "/opt/chrome/stable/chrome-linux.zip" -d "/opt/chrome/stable/"
mv /opt/chrome/stable/chrome-linux/* /opt/chrome/stable/
rm -rf /opt/chrome/stable/chrome-linux "/opt/chrome/stable/chrome-linux.zip"
done
# Download Chromedriver
for dr in ${chrome_drivers[#]}
do
echo "Downloading Chromedriver version $dr"
mkdir -p "/opt/chromedriver/stable/"
curl -Lo "/opt/chromedriver/stable//chromedriver_linux64.zip" \
"https://chromedriver.storage.googleapis.com/$dr/chromedriver_linux64.zip"
unzip -q "/opt/chromedriver/stable//chromedriver_linux64.zip" -d "/opt/chromedriver/stable/"
chmod +x "/opt/chromedriver/stable/chromedriver"
rm -rf "/opt/chromedriver/stable/chromedriver_linux64.zip"
done
echo "Chrome & Chromedriver installed"
Changed the Dockerfile to the following
FROM public.ecr.aws/lambda/python:3.8 as base
# Hack to install chromium dependencies
RUN yum install -y -q unzip
RUN yum install -y https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
# Install Chromium
COPY install-browser.sh /tmp/
RUN /usr/bin/bash /tmp/install-browser.sh
#FROM public.ecr.aws/lambda/python:3.8
# Install Python dependencies for function
COPY requirements.txt /tmp/
RUN pip install --upgrade pip -q
RUN pip install -r /tmp/requirements.txt -q
COPY app.py ./
CMD [ "app.handler" ]
And finally, the missing dependencies were also solved within this docker file by installing Chrome directly that brought with itself 122 packages that were needed to run Chrome.
I've put this in a GitHub Repository here and explained the steps in a blog post here.

Elastalert deployment failed

I have installed elastalert on Centos 7.6 and while starting the elastalert receiving the following error.
[root#e2e-27-36 elastalert]# python -m elastalert.elastalert --verbose --rule example_rules/example_frequency.yaml
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/elastalert/elastalert/elastalert.py", line 29, in <module>
from . import kibana
File "elastalert/kibana.py", line 4, in <module>
import urllib.error
ImportError: No module named error
How should I go about fixing this?

You can try to check if urllib3 is installed by running pip freeze or try to reinstall it with pip install urllib3.
You maybe need to correctly activate your environment variable like this : source [env]/bin/activate.

Setup conda environment
conda create -n elastalert python=3.6 anaconda
Activate conda env
conda activate elastalert
Install all the requirements
pip install -r requirements-dev.txt
pip install -r requirements.txt

I have found my fix by own.
1.On python2.7 the issue still persist
2.Install python3.6 version to fix the issue.
yum install python3 python3-devel python3-urllib3
3.Run the elastalert command
python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml
4.If you received issue with the modules (ModuleNotFoundError: No module named 'pytz')
5.Install the modules as per the requirement.
pip3 install -r /root/elastalert/requirements.txt
6.Let's run the command "python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml" and got error
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='elasticsearch.example.com', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))
7.Above error due to not valid hostname on config.yaml file. Edit the config.yaml file and change the hostname to server hostname at es.hosts field
Make sure you had an entry for the same on the /etc/hosts file.
8.Ok the issue got fixed and run the command "python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml" and one more error
pkg_resources.DistributionNotFound: The 'jira>=2.0.0'
9.We need to install the jira by using below command
pip3 install jira==2.0.0
10.Now let's run the command "python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml" and again another error OMG.
elasticsearch.exceptions.TransportError: TransportError(429, 'circuit_breaking_exception', '[parent] Data too large, data for [] would be [994793504/948.7mb], which is larger than the limit of [986061209/940.3mb], real usage: [994793056/948.7mb], new bytes reserved: [448/448b]')
11.You need to fix the same by changing the heap value on following /etc/elasticsearch/jvm.options
Xms-1g to Xms-2g
Xmx-1g to Xms-2g
and restart elasticsearch service "service elasticsearch restart"
12.Everything set again run the command "python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml" and ended up receiving another error.
ERROR:root:Error finding recent pending alerts: NotFoundError(404, 'index_not_found_exception', 'no such index [elastalert_status]', elastalert_status, index_or_alias) {'query': {'bool': {'must': {'query_string': {'query': '!exists:aggregate_id AND alert_sent:false'}}, 'filter': {'range': {'alert_time': {'from': '2019-12-04T19:45:09.635478Z', 'to': '2019-12-06T19:45:09.635529Z'}}}}}, 'sort': {'alert_time': {'order': 'asc'}}}
13.Fix the issue by running the below command
elastalert-create-index
14.Finally everything done and run the below command
python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml
Now cancelled the command and ran the same on background
python3 -m elastalert.elastalert --config /root/elastalert/config.yaml --verbose --rule /root/elastalert/example_rules/example_frequency.yaml &

WSL, Docker + Python logger -- IOError: [Errno 2] No such file or directory

Running Docker for Windows (Version 18.06.1-ce-win73 (19507))
Calling a behave application (python testing framework) using docker compose file:
version: "3"
services:
behave:
build:
context: .
environment:
NODE_ENV: test
DB_DATABASE: testdb
volumes:
- ".:/app"
command:
- bash
- run_test.sh
- docker
- --capture
- --stop
- ${FEATURE:-feature}/
- ${TAGS}
network_mode: host
The Dockerfile is pretty vanilla:
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get -y install build-essential software-properties-common curl bzip2 libfreetype6 libfontconfig wget libcurl4-openssl-dev
RUN cd /usr/local/share \
&& wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-linux-x86_64.tar.bz2 \
&& tar xjf phantomjs-1.9.7-linux-x86_64.tar.bz2 \
&& ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/share/phantomjs \
&& ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs \
&& ln -s /usr/local/share/phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/bin/phantomjs
RUN apt-get -y install python python-dev python-setuptools python-pycurl python-tz python-pymongo python-cffi python-openssl python-pip
ADD . /app
RUN cd /app \
&& apt-get -y install python-httplib2 \
&& pip install -r requirements.txt \
&& python -m easy_install --upgrade pyOpenSSL
WORKDIR /app
ENV HOME=/app
In the behave app, I use the standard python logging as such:
fileHandler = logging.FileHandler(BASEDIR + "logs/" + name + ".log", mode='w')
fileHandler.setLevel(flevel)
fileHandler.setFormatter(logFormatter)
rootLogger.addHandler(fileHandler)
This runs fine in linux directly. And, it also runs fine in WSL directly (i.e. not via docker).
The failure is:
File "/app/testlib/log_helpers/__init__.py", line 55, in get_dual_logger
fileHandler = logging.FileHandler(BASEDIR + "logs/" + name + ".log", mode='a')
File "/usr/lib/python2.7/logging/__init__.py", line 913, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/lib/python2.7/logging/__init__.py", line 944, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/app/logs/behave.log'
To try and see if there was anything strange, I added a simple print statement in /usr/lib/python2.7/logging/__init__.py before it opens the file. As my test starts, it prints:
Opening /app/logs/behave.log with: a
Opening /app/logs/behave.log with: a
before failing (so the first open works, the second fails).
If I update my logger to always write to a unique file with code like this:
while os.path.exists(BASEDIR + "logs/" + name + ".log"):
name += "_"
fileHandler = logging.FileHandler(BASEDIR + "logs/" + name + ".log", mode='a')
...
then it works fine.
Opening /app/logs/behave.log with: a
Opening /app/logs/behave_.log with: a
Opening /app/logs/behave__.log with: a
Opening /app/logs/behave___.log with: a
Opening /app/logs/behave____.log with: a
Opening /app/logs/behave_____.log with: a
Opening /app/logs/behave______.log with: a
Opening /app/logs/behave_______.log with: a
Opening /app/logs/behave________.log with: a
Opening /app/logs/behave_________.log with: a
Opening /app/logs/behave__________.log with: a
Opening /app/logs/behave___________.log with: a
Opening /app/logs/behave____________.log with: a
So, the core of the problem is the standard library python command (in logger.py):
stream = open(self.baseFilename, self.mode)
Whether I use mode of "w" or "a", the second time a log file is opened by python inside a docker container running on WSL, this fails.
Has anybody ever seen anything like this in WSL? Any workaround?
Seems very, very specific to my use-case, not sure if this is known or not

headless chrome in docker with python. Chrome failed to start: crashed

I want to run this simple script inside a docker container:
def hi_chrome():
from xvfbwrapper import Xvfb
from splinter import Browser
vdisplay = Xvfb()
vdisplay.start()
print "spawning connector"
oBrowser = Browser('chrome')
oBrowser.visit("http://google.co.za")
assert oBrowser.title == "Google"
print "yay"
vdisplay.stop()
if __name__ == '__main__':
hi_chrome()
I've gotten the script to run in a virtual environment by doing all the pip and apt-get installs listed in the my docker file and just running the script. But when I try run it inside a container I get:
Traceback (most recent call last):
File "app.py", line 19, in <module>
hi_chrome()
File "app.py", line 10, in hi_chrome
oBrowser = Browser('chrome')
File "/usr/local/lib/python2.7/dist-packages/splinter/browser.py", line 63, in Browser
return driver(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/chrome.py", line 31, in __init__
self.driver = Chrome(chrome_options=options, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed
(Driver info: chromedriver=2.27.440175 (9bc1d90b8bfa4dd181fbbf769a5eb5e575574320),platform=Linux 4.8.0-34-generic x86_64)
I've had similar problems when trying to run my script using other containers on docker-hub. I've tried using chrome instead of chromium and I've tried using some containers I found on docker-hub but I keep finding broken nonesense. This should be simple.
My main suspicion is that it's a versioning thing. But it works in the venv so that doesnt make too much sense. Or docker just needs something fancy to get the chrome webdriver to run.
Can someone please point out my obvious and noobish mistake?
My Dockerfile looks like
FROM ubuntu:16.04
RUN apt-get update -y && \
apt-get install -y python-pip python-dev xvfb chromium-browser && \
pip install --upgrade pip setuptools
RUN pip install chromedriver_installer
COPY ./requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install -r requirements.txt
COPY . /app
ENTRYPOINT [ "python" ]
CMD [ "app.py" ]
And requirements.txt:
splinter==0.7.5
xvfbwrapper==0.2.8

I found an image that sorta worked and then beat it into submission... Nice thing about this solution is it doesn't need xvfbwrapper so it's nice and simple.
App.py
def hi_chrome():
# from xvfbwrapper import Xvfb
from splinter import Browser
# vdisplay = Xvfb()
# vdisplay.start()
print "spawning connector"
oBrowser = Browser('chrome')
oBrowser.visit("http://google.co.za")
assert oBrowser.title == "Google"
print "yay"
# vdisplay.stop()
if __name__ == '__main__':
hi_chrome()
requirements:
splinter==0.7.5
Dockerfile
FROM markadams/chromium-xvfb
RUN apt-get update && apt-get install -y \
python python-pip curl unzip libgconf-2-4
ENV CHROMEDRIVER_VERSION 2.26
RUN curl -SLO "https://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip" \
&& unzip "chromedriver_linux64.zip" -d /usr/local/bin \
&& rm "chromedriver_linux64.zip"
COPY requirements.txt /usr/src/app/requirements.txt
WORKDIR /usr/src/app
RUN pip install -r requirements.txt
COPY . /usr/src/app
ENTRYPOINT [ "python" ]
CMD [ "app.py" ]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pass arguments to scrapy spider through docker run - python

Related

How to run Chrome Headless in Docker Container with Selenium?

Running a Selenium webscraper in AWS Lambda using Docker

Elastalert deployment failed

WSL, Docker + Python logger -- IOError: [Errno 2] No such file or directory

headless chrome in docker with python. Chrome failed to start: crashed

Categories

Resources