Google cloud run and web scraping with selenium

Google cloud run and web scraping with selenium - python

I am currently trying to get a web scraping script to be used on Google Cloud to import data to the Google Sheets API. I had a quick look around and Google Cloud Run seems that it may be able to do this. Unfortunately the script requires Selenium and headless browsing to run the script and I'm coming up short on how to do this properly. I am quite new to the idea of serverless scripts and can't seem to find much out on the web that's specific enough to my needs with regards to Selenium.
I have been able to Dockerize the script/dependencies with Selenium with headless Chrome. However I'm wondering what may be the best way to deploy this in Google Cloud ?
Any thoughts would be appreciated.

See https://dev.to/googlecloud/using-headless-chrome-with-cloud-run-3fdp for an in-depth example of how to use Headless Chrome + Selenium on Cloud Run.

Related

Deploying Web App problems on chrome however, not on Microsoft edge. WHY?

I have recently deployed my web app utilizing the hosting service pythonanywhere.com. My website seems to do just fine when being opened on Microsoft Edge. However, when I open the site on chrome it is does not load the majority of images/animations/text. In addition, my website works well when running on my computer on chrome (not local host just a regular link). It is running using Flask (python) and html/CSS/JavaScript. I do not know where to start looking in terms of trouble shooting this problem. Does anyone have any advice?
I tried to change the html files for each page as they are saved as Microsoft Edge html files to chrome files. Have yet to fully pursue that potential solution, as I am using git to transfer the files to pythonanywhere so I think i need to change the files in git and not just my computer.

How to run selenium on an actual webserver?

I've no idea on how to do this and all the documentation that I could find by google did not help. A while back I was introduced to selenium through this tutorial and now that I'm more comfortable with it, I want my selenium "bot" to run on a webserver 24/7, receiving orders from me through facebook messenger (something I already did with it running on my local machine).
I tried to find answers online and was overwhelmed by the amount of information, finding nothing that is clear to understand. All the pages I've been through require me to learn about a large array of things and have been very specific about their tools. And some times I try to follow along something just to receive an error I don't understand nor is it explained on said something how to fix it.
I also asked this question on Reddit only to be downvoted without answer. I've no idea how to run selenium + chrome on a server.
Take me for the stupidest person on earth, How can I do this in the most clear steps? I'd prefer to use chrome with selenium, through python or php.

You can try it by making your chromedriver run headlessly. I was introduced to it by this tutorial. a headless browser means a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to your local browser and you can get screenshots too.
If headless browser is giving you an error which can't be resolved(like screen sharing error), then you can try aws or Google Cloud like platforms

How to control firefox multi-account containers using python and selenium?

I am working on a web automation project, which involves working with the same webpage but different accounts at the same time. For this purpose, I am using firefox multi-account containers.
Can anyone help me with how to automate container operations using python and selenium, eg, open a site in a new tab with specific container?

Running selenium webdriver in amazon lambda python

I want to run BeautifulSoup and selenium webdriver in amazon lambda and my running environment is python 3.6. Is it possible to run ? if so How. My intention is to scrap datas from a webpage using beautiful soup 4 and selenium(Since it has to scrap data dynamically generated by javascript).

Yes, it's possible. You need to package a headless Chrome binary and chromedriver along with all the Python packages you need. You'll also need to set several options in Selenium's Chrome web driver to make it work.
I wrote a step-by-step tutorial after spending several frustrating weeks trying to deploy it.

You will need to create a deployment package and upload it to Lambda if you are going to use dependancies outside of the standard library.
I have a write up about using BS4 and Lambda together. I did not use Selenium within Lambda but I do have extensive Selenium experience. You will not be able to execute commands within a browser using Lambda. You are going to need to have a remote server stood up, running Selenium Server. Download Selenium and the webdrivers on the machine that you wish to do the web scraping, start the .jar file, it will open a port on the machine Selenium will communicate with.
Considering that you will need a machine running probably windows to fire up a browser and scrape these pages, you probably don't need lambda in the end.

Error when I am deploying my google cloud endpoints python application in localhost with App Engine SDK

I use the google cloud endpoints configuration as
https://cloud.google.com/appengine/docs/python/endpoints/test_deploy
exactly, I run google chrome with flag
--user-data-dir=temp --unsafely-treat-insecure-origin-as-secure=http://localhost:8080
as
https://developers.google.com/explorer-help/#hitting_local_api
then, Google Chrome say me
You are using an unsupported command-line
(--unsafely-treat-insecure-origin-as-secure) flag. Stability and
security will suffer
If I visit
_http://localhost:8080/_ah/api/explorer
then I get the error:
The API you are exploring is hosted over HTTP, which can cause
problems. Learn how to use Explorer with a local HTTP API .
I try to add --test-type flag as
_http://stackoverflow.com/questions/32042187/chrome-error-you-are-using-an-unsupported-command-line-flag-ignore-certifcat
then google chrome dont say anything, but when I visit
_http://localhost:8080/_ah/api/explorer
I get the same error.
my app works fine in localhost except the endpoints part, and all works fine in appspot.com (endpoints too)
I use the last version of:
Python 2.7.11
App Engine SDK 1.9.35
Google Chrome 49.0.2623.110
thank you and sorry for my english

Click the shield button in Chrome's url bar.
Click "Load unsafe scripts".
Click services (on the left bar) to reload the page.

I tried troubleshooting this issue when it first started happening for me. I quickly gave up and decided to just use another browser for the API explorer on localhost. IE 11 works for me. This isn't a great answer, but if you have other browsers installed, give them a try.

While Alex has the best answer, I'd just like to point out that this is only a problem with the API Explorer (which can definitely be handy).
But it doesn't affect direct calls to the API itself. So the URL below works fine without having to open a special sandbox chrome app or changing the script settings
localhost:8080/_ah/api/greeting/v1/greetings/1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google cloud run and web scraping with selenium - python

See https://dev.to/googlecloud/using-headless-chrome-with-cloud-run-3fdp for an in-depth example of how to use Headless Chrome + Selenium on Cloud Run.

Related

Deploying Web App problems on chrome however, not on Microsoft edge. WHY?

How to run selenium on an actual webserver?

How to control firefox multi-account containers using python and selenium?

Running selenium webdriver in amazon lambda python

Error when I am deploying my google cloud endpoints python application in localhost with App Engine SDK

Categories

Resources