Running selenium webdriver in amazon lambda python - python

I want to run BeautifulSoup and selenium webdriver in amazon lambda and my running environment is python 3.6. Is it possible to run ? if so How. My intention is to scrap datas from a webpage using beautiful soup 4 and selenium(Since it has to scrap data dynamically generated by javascript).

Yes, it's possible. You need to package a headless Chrome binary and chromedriver along with all the Python packages you need. You'll also need to set several options in Selenium's Chrome web driver to make it work.
I wrote a step-by-step tutorial after spending several frustrating weeks trying to deploy it.

You will need to create a deployment package and upload it to Lambda if you are going to use dependancies outside of the standard library.
I have a write up about using BS4 and Lambda together. I did not use Selenium within Lambda but I do have extensive Selenium experience. You will not be able to execute commands within a browser using Lambda. You are going to need to have a remote server stood up, running Selenium Server. Download Selenium and the webdrivers on the machine that you wish to do the web scraping, start the .jar file, it will open a port on the machine Selenium will communicate with.
Considering that you will need a machine running probably windows to fire up a browser and scrape these pages, you probably don't need lambda in the end.

Related

What other options are there to control a browser with python that aren't selenium?

Ive been working alot with browser automation and python lately, and I've been using selenium and chromedriver but I have found a few limitations. For example, it's very easy for websites to tell that you are using selenium aswell as each chrome instance taking up alot of computer memory while running. I was wondering if there are any alternative python libraries that can also control a browser window in the same ways that selenium does?
Thanks
There is Pylenium that I'm aware of. Its based on top of selenium but exposes some cypress styled DSLs. You can check out the documentation over here :
https://elsnoman.gitbook.io/pylenium/

How do I package a Python web scraper as a Chrome Extension?

I made an Amazon Web Scraper and I want it work as an extension. What will happen is that it will display the price if it is lower than the previously recorded price. I don't know Javascript. I went through things like Transcrypt but didn't understand much
You cannot. Chrome extensions are written in JS.
The only way to accomplish what you want is to use the extension as a bridge from users browser to your script. You'll need to convert the script into a server of some kind that can accept requests from the extension and respond.

Portable selenium

For all my projects written in python where I use selenium to scrape websites I can only run the script from my own machine and if I were to send the script to a client if say he needed it to run on a daily basis, it most probably wouldn't work.
Is there a way to use selenium webdriver in a way for the script to be portable and able to run from any platform so that I could send it to my clients and be confident that it would work. I couldn't find anything definite on the internet that would help me.
If this is not possible with selenium is it possible with some other python module? So far for pages that use javascript I used selenium for scraping. Should I switch to something else for portability? Please advise me. I would really appreciate if someone could point me in the right direction.
I would download a version of your browser driver (e.g. chromedriver for Chrome) for all available platforms and put all of them in the script folder.
I would then zip it and share it with the customer.
It would also be quite easy to build a script that automatically checks local Operating System and dowloads the needed driver from internet (using Python wget or similar) but I do not see a serious advantage in using this approach.
As a final thought it is also possible using Selenium with remote WebDriver but that would complicate things and leave you with a server to mantain and update.

Run iMacros in Firefox via Python

I have process that uses iMacros in Firefox to open some websites and click on some buttons and do some stuff (not any weird stuff, internal work pages). The problem is that I basically can't use my computer while that happens.
I want to automate this via python and found this:
Integrating iMacros scripts into python
However the answer to that question and the links mention that I need the business or enterprise version of it.
Is there a way to just do something like:
Open firefox (I know how)
Use (as a plugin) iMacros to run a iim script in x location
Thanks!!
You can have 100% control over Firefox with Python, as both are open source. The trick is to figure out details. Here are some starting points
Python can script Firefox with Selenium WebDriver
With some tricks, you can dive deeper into Firefox what basic Selenium interaction offers, like opening a web pages. This would include giving direct commands to plugins. Here is an example of settings Firefox profile in a mode that normal security restrictions do not apply.
You need to study Firefox architecture how you can trigger iMacros plugin commands from Selenium. This is the tricky part as this is very marginal use case and there might not be much information available. Expect spending few days of learning Firefox internals.
My guess is that you can disable Firefox security, and then use Selenium WebDriver to run a JavaScript snippet which gives direct commands to iMacros component.

Selenium webdriver using PhantomJs fails to redirect in python

i'm tring to automate an oauth2 authentication for a webcrawler in python, i chose to use selenium as a webdriver for doing so.
i won't have root on the machines that i'm going to use so i chose to use phantomjs as this script need to be headless and so i won't be able to install xvfb in these machines.
now i found out that phantomJS have this bug: https://github.com/ariya/phantomjs/issues/10389
and the oauth2 page i need to use have that kind of redirects, the workaround in these page is in javascript and quite useless to me, there is a workaround (considering these premises) or another solution to be completely headless in python?

Categories