I have a node app running on Heroku. I am scraping a website using selenium in python and calling the python script from my node app whenever I need to. I installed PhantomJS on my mac and when I run the app locally (node index.js), everything works just fine.
path_to_phantom = '/Users/govind/Desktop/phantomjs-2.1.1-
macosx/bin/phantomjs'
browser = webdriver.PhantomJS(executable_path = path_to_phantom)
However, nothing seems to work on Heroku. I also added the PhantomJS buildpack to my node app but it just doesn't call the python script. The problem I think is the path to PhantomJS buildpack. What path should I add? Or is there any other aspect I'm missing here?
I managed to use Selenium with PhantomJS in my Python application deployed to Heroku following these steps:
1) Switch to using the Cedar-14 stack on my Heroku application
$ heroku stack:set cedar-14
2) Install a PhantomJS buildpack
$ heroku buildpacks:add https://github.com/stomita/heroku-buildpack-phantomjs
With these changes I could then use Selenium to fetch websites
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get("http://www.google.com") # This does not throw an exception if it got a 404
html = browser.page_source
print html # If this outputs more than just '<html><head></head><body></body></html>' you know that it worked
Related
I tried everything that I found. Tried to connect with extension but it was unsuccessful (I didn't find a ext config). Tried to internal settings (about:config). Tried connect with JS inside chrome.
Can I just use proxy for entire process (WebDriver)?
Chrome cant work with socks5, but you can use it in Firefox with addon (Proxyfoxy)
Create WEbDrive
Install addon
Setup proxy and select it
Done!
I have a Django web app hosted in IIS with a function using selenium that automates the transfer of data from another application. It runs when I run it on my localhost, however with IIS it doesn't even pull up the browser.
I pasted the code that pulls up a website I need:
from selenium import webdriver
def transfer(request):
browser = webdriver.Chrome()
browser.get('http://localhost/virtus#scorecard/rateteammember')
browser.set_page_load_timeout(10)
I am using windows 10 and IIS version 10.0.15063.0
I am also using ChromeDriver version 2.42.591088 that says 'Only local connections are allowed' which I think is the source of the problem.
Can someone please advise what to do next.
Has anyone had experience deploying a Django app on Heroku that contains a script within it?
I'm currently building a web app that works as a front end tool to a couple of python bots that I have been creating during the years, as well as for personal useful tools (Scripts). The issue is that I've been trying to deploy my app on Heroku and test the functionality of running my Tinder Bot from within Heroku, but I have not been successful, i'm currently getting errors such as:
"unknown error: Chrome failed to start: exited abnormally (unknown error: DevToolsActivePort file doesn't exist"
Thing's that I've done to try to solve the issue:
1- Setting the chrome arguments as:
chrome_settings = Options()
chrome_settings.add_argument('--headless')
chrome_settings.add_argument('--no-sandbox')
chrome_settings.add_argument('--disable-dev-shm-usage')
As well as setting the directories for chrome binary and chromedriver to:
GOOGLE_CHROME_BIN = '/app/.apt/usr/bin/google-chrome'
path_of_chrome_driver = '/app/.chromedriver/bin/chromedriver'
But I am still getting the same error. Would anyone know whats the proper procedure to create a web app that can contain python scripts in it, and to be deployed on Heroku?
Ps: I have followed the steps to set up (Whitenoise and gunicorn) for heroku deployment.
Requirements:
Django==2.1.1
gunicorn==19.9.0
pytz==2018.5
selenium==3.14.1
urllib3==1.23
whitenoise==4.1
This is the site I am developing: https://bot-tools-collection.herokuapp.com/
Update
Link to the code:
https://gist.github.com/keithlowc/d0b274005ecf9d41b4f087620b487dc5
hey guys i tried to deploy my scrapy project onto Heroku and everything was deployed. But when i try to execute heroku run scrapy crawl crawlername i get the error
RuntimeError: Could not find firefox in your system PATH. Please specify the firefox binary location or install firefox. i have added these lines to /etc/paths.
/Applications
/Applications/Firefox.app
/Applications/Firefox.app/Contents/MacOS/firefox-bin
/Applications/Firefox.app/Contents/MacOS/firefox
After adding them to etc/paths i am able to run firefox from the terminal by just typing firefox and firefox will open. But when i try to rerun heroku run scrapy crawl crawlername i get the same error.
self.driver = webdriver.Firefox()
self.driver.implicitly_wait(10)
thats inside my scrapy script to open the web browser.
if im in the same directory that i have created the heroku app and i run scrapy crawl crawlername everything works fine and i get my results but when i try to do heroku run scrapy crawl crawlername i get the error regarding firefox shown above. Any help will be greatly appreciated thanks.
As heinst mentions Firefox isn't installed on Heroku. This means that you have to have Heroku not on your local machine but on the Heroku instance which is running your script.
The lines you've added to /etc/paths are of your Mac -- not the Heroku instance you are trying to run your application on.
Alternatively try to use something like this buildpack. This requires some additional tools like Xvbf to have your Firefox an in-memory buffer to let Selenium click the right buttons and so on.
Alternatively: why not using Scrapy without Selenium? Most of the tasks can be done without any browser-interaction but you have to search of course a bit more to find your solution.
So here's my setup:
Using a flask server with uwsgi, and through a controller action, calling a python script that uses splinter (which uses selenium) to automate the gui. The web server doesn't have a display, so I'm using xvfb.
Sshing into the machine and running xvfb and exporting display=:99, and then running the python script works great. But running it through a controller action does not work - I get the following error:
WebDriverException: Message: The browser appears to have exited before we could connect.
(this is the same error that is returned when xvfb isn't running)
ps aux shows that xvfb is running as the same user as the web server (I've isolated everything, and have a separate controller action that executes:
p = subprocess.Popen("Xvfb :99 &", stdout=fstdout,stderr=fstderr, shell=True))
and DISPLAY is set to :99 on both root and the web server user.
I could install vncserver and try that, but I suspect I will end up with the same problem. I've also tried to avoid calling xvfb directly and using PyVirtualDisplay instead, but same problem.
edit: it errors on this line (if using splinter):
browser = Browser()
or, if selenium:
with pyvirtualdisplay.Display(visible=True):
binary = FirefoxBinary()
driver = webdriver.Firefox(None, binary)
(it errors on the last line there)
Any ideas?