Convert HTML files to png in Colab

Convert HTML files to png in Colab - python

I want to convert html files created with Folium to png to finally convert them in a single gif.
I'm stuck on converting html to images. I've tried so far (on Colab):
Code1:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.webite-url.com")
import os
import imageio
import webbrowser
Error1:
WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Code 2:
import os
import subprocess
url="osm1.html"
outfn = "outfig.png"
subprocess.check_call(["{}".format(url), "--out={}".format(outfn)])
Error2:
FileNotFoundError: [Errno 2] No such file or directory: 'osm1.html': 'osm1.html'
(osm1.html is in the root of Colab)
Code 3:
!pip install bokeh
import bokeh
from bokeh.io import export_png
url="osm1.html"
export_png(url, filename="plot.png")
Error3:
ValueError: OutputDocumentFor expects a sequence of Models
Code 4:
!pip install imgkit
!apt-get install xvfb
!apt-get install wkhtmltopdf
import imgkit
imgkit.from_file('osm1.html', 'out.jpg')
Error4:
OSError: wkhtmltoimage exited with non-zero code 1. error:
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
qt.qpa.screen: QXcbConnection: Could not connect to display
Could not connect to any X display.
You need to install xvfb(sudo apt-get install xvfb, yum install xorg-x11-server-Xvfb, etc), then add option: {"xvfb": ""}.
Any of the solution above can be easily solved? I don't know more ways to do it.

Related

How do I write my Dockerfile to include chromedriver?

I am a newbie to Dockerfile as well as Selenium. I was working on the web scraping using selenium and taking a screenshot. I am trying to dockerize it. This questions of mine seems to be answered in a few questions but it did not solve my error. FYI, I am using a Windows laptop.
The screenshot code works on my local machine but dockerfile seems to be giving me errors.
I am trying to use this version of chromedriver=89.0.4389.82
This is my UPDATED Dockefile,
FROM python:3.6
RUN pip install --upgrade pip && pip install pytest && pip install pytest-mock && pip install pytest-smtp && pip install mock \
pip install schedule && pip install selenium && pip install Selenium-Screenshot && pip install python-dateutil
# For running code
COPY src/screenshotcode.py /
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list
RUN apt-get update -y
RUN apt-get install -y google-chrome-stable
RUN apt-get install libxi6 libgconf-2-4 -y
ENV CHROMEDRIVER_VERSION 2.19
ENV CHROMEDRIVER_DIR /chromedriver
RUN mkdir -p $CHROMEDRIVER_DIR
# Download and install Chromedriver
RUN wget -q --continue -P $CHROMEDRIVER_DIR "http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip"
RUN unzip $CHROMEDRIVER_DIR/chromedriver* -d $CHROMEDRIVER_DIR
# Put Chromedriver into the PATH
ENV PATH $CHROMEDRIVER_DIR:$PATH
CMD [ "python", "screenshotcode.py" ]
My screenshot code,
import time
from Screenshot import Screenshot_Clipping
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from email_it import email_it
from environmental_variables import environmental_variables
from error_alert_email import error_alert_email
from selenium import webdriver
def screenshot():
ob=Screenshot_Clipping.Screenshot()
chrome_options = Options()
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--start-fullscreen')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
print('taking screenshot...')
img_url=ob.full_Screenshot(driver, path = path, image_name = label)
print('closing driver...')
driver.close()
screenshot()
EDIT: I get the following error
PS C:\Users\me\Documents\Projects\> docker run screenshot
File "scheduler.py", line 16, in <module>
from screenshot import screenshot
File "/screenshotcode.py", line 72, in <module>
screenshot()
File "/screenshotcode.py", line 32, in screenshot
driver = webdriver.Chrome()
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.19.346067 (6abd8652f8bc7a1d825962003ac88ec6a37a82f1),platform=Linux 5.4.72-microsoft-standard-WSL2 x86_64)

You set in the code the chromedriver to be at:
driver = webdriver.Chrome(executable_path = r"C:\Users\me\Documents\Projects\chromedriver.exe")
but in your dockerfile you have it at /usr/local/bin/chromedriver
so you need to change your code to
driver = webdriver.Chrome(executable_path = "/usr/local/bin/chromedriver")

Is there a way we can use Selenium on Google Colab like in Jupyter Notebook?

I've been using Selenium with Jupyter notebook (with Chrome webdriver) for a while. Using it on Jupyter NB, a new automated window is opened and I can see my code at work which is a great utility when the automation includes selecting an option from a drop down or such.
But while using Selenium (with Chrome web driver) on Google Colab, a new automated tab/window is not opened and I can't really see what my code is doing. Feels like being in a dark cave without a Torch.
Can anyone tell me how can I see an automated tab of the code while using Selenium on Colab?
This is what I have tried till now:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.python.org")
and this too
!pip install selenium
!apt-get update
!apt install chromium-chromedriver
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver =webdriver.Chrome('chromedriver',chrome_options=chrome_options)

None permission for chromedriver.exe in colab

I am trying to run the webdriver resource in the selenium module (python) in Chrome for the google colab. Firstval I have problems to parse the chromedriver.exe file in the command (selenium.webdriver.Chrome('/chromedriver.exe')), overcome that I found the continuos failure of none permission to run the chromedriver.exe, and the version is ok, who knows what possibly is wrong?
WebDriverException: Message: 'chromedriver.exe' executable may have wrong permissions.

You can do it by installing the chromium webdriver and adjusting some options such that it does not crash in google colab:
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get("https://www.webite-url.com")

OSError: [Errno 8] Exec format error: 'chromedriver' using Chromedriver on Ubuntu server

I'm trying to use Chromedriver with Ubuntu (AWS instance). I've gotten Chromedriver to work no problem in a local instance, but having many, many issues doing so in a remote instance.
I'm using the following code:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Chrome(executable_path='/usr/bin/chromedriver', chrome_options=options)
However, I keep getting this error:
Traceback (most recent call last):
File "test.py", line 39, in <module>
driver = webdriver.Chrome()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: 'chromedriver'
I believe I'm using the most updated version of Selenium, Chrome, and Chromedriver.
Chrome version is:Version 78.0.3904.70 (Official Build) (64-bit)
Selenium:
ubuntu#ip-172-31-31-200:/usr/bin$ pip3 show selenium
Name: selenium
Version: 3.141.0
Summary: Python bindings for Selenium
Home-page: https://github.com/SeleniumHQ/selenium/
Author: UNKNOWN
Author-email: UNKNOWN
License: Apache 2.0
Location: /home/ubuntu/.local/lib/python3.6/site-packages
Requires: urllib3
And, finally, for Chromedriver, I'm almost certain I downloaded the most recent version here: https://chromedriver.storage.googleapis.com/index.html?path=78.0.3904.70/. It's the mac_64 version (I'm using Ubuntu on a Mac). I then placed chromedriver in /usr/bin , as I read that's common practice.
I have no idea why this isn't working. A few options I can think of:
some sort of access issue? I'm a beginner with command line and ubuntu - should I be running this as "root" user?
mis-match between Chromedriver and Chrome versions? Is there a way to tell which chromedriver version I have for certain?
I see that Chromedriver and Selenium are in different locations. Selenium is in: Location: /home/ubuntu/.local/lib/python3.6/site-packages and I've moved chromedriver to: /usr/bin . Could this be causing problems?

Ubuntu Server 18.04 LTS (64-bit Arm):
Download Chrome: wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
Install Chrome: sudo dpkg -i google-chrome-stable_current_amd64.deb
If you'll get error run: sudo apt-get -f install
Check Chrome: google-chrome --version
Download chromedriver for Linux: wget https://chromedriver.storage.googleapis.com/78.0.3904.70/chromedriver_linux64.zip
Unzip chromedriver, install unzip sudo apt install unzip if required: unzip chromedriver_linux64.zip
Move chromedriver to /usr/bin: sudo mv chromedriver /usr/bin/chromedriver
Check chromedriver, run command: chromedriver
Install Java: sudo apt install default-jre
Install Selenium: sudo pip3 install selenium
Create test file, nano test.py with content below. Press CTRL+X to exit and the Y to save. Execute your script - python3 test.py
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
try:
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.google.com")
s = driver.find_element_by_name("q")
assert s.is_displayed() is True
print("ok")
except Exception as ex:
print(ex)
driver.quit()
Example of using Docker and selenium/standalone-chrome-debug:
Install docker, installation steps are here
Start container, using sudo docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:3.141.59-xenon command, different options are here
Open Security Group of your instance in AWS and add TCP rule to be able to connect. You can add only your own IP and port 4444 for Selenium
Run test from local
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--remote-debugging-port=9222")
driver = webdriver.Remote(command_executor="http://your_instance_ip:4444/wd/hub",
desired_capabilities=options.to_capabilities())

I am running the following on ec2-ubuntu:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options) #Give the full path to chromedriver
Try it. Incase it doesn't work I will find more of the settings.

Python Selenium Geckodriver Connection refused

I spent hours trying to make Selenium works with Python no luck
this error message
selenium.common.exceptions.WebDriverException: Message: connection refused
this is the example I have used:-
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(800, 600))
display.start()
browser = webdriver.Firefox()
browser.get('http://www.python.org')
browser.close()
This is depence I intalled
apt-get install -y xorg xvfb dbus-x11 xfonts-100dpi xfonts-75dpi xfonts-cyrillic
This is /root/geckodriver.log output
1493938773101 geckodriver INFO Listening on 127.0.0.1:40876
1493938774156 geckodriver::marionette INFO Starting browser
/usr/lib/firefox/firefox.sh with args ["-marionette"] (firefox:3128):
GLib-GObject-CRITICAL **: g_object_ref: assertion 'object->ref_count >
0' failed
I'm running Selenium on Ubuntu 14.04 64-bit VPS remote server with 128MB RAM
I can't figure out what's make Selenium not able to communicate with browsers drivers both Chrome and Firefox.

Please start with checking your "firefox" browser version.
I found it very confusing at some point. I'm using the Raspbian and the "Iceweasel" downloaded with apt-get was a Firefox 52 version which didn't work with geckodriver 0.19 (this one requires Firefox 55 or greater).
What worked for me was to download geckorvider v0.16 and that resolved the problem.
Whats moreover, you probably don't need xorg to make it work, the only packages I needed was xfvb and iceweasel.

Ok, I gave up on Geckodriver and I use PhantomJS as my webdriver.
from selenium import webdriver
display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.PhantomJS()
driver.get('http://www.python.org')
html_source = driver.page_source
print ("html_source:",html_source)
driver.quit()
Here are the steps I used to install PhantomJS :
cd ~
export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64"
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
tar xvjf $PHANTOM_JS.tar.bz2
mv $PHANTOM_JS /usr/local/share
ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin
Python Selenium
apt-get install python-pip -y
pip uninstall pyvirtualdisplay
apt-get install x11vnc xvfb fluxbox
Xvfb :99 -ac
xvfb-run -a python 99.py
pip uninstall selenium
pip install selenium==2.53.1
See also How to install PhantomJS on Ubuntu.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert HTML files to png in Colab - python

Related

How do I write my Dockerfile to include chromedriver?

Is there a way we can use Selenium on Google Colab like in Jupyter Notebook?

None permission for chromedriver.exe in colab

OSError: [Errno 8] Exec format error: 'chromedriver' using Chromedriver on Ubuntu server

Python Selenium Geckodriver Connection refused

Categories

Resources