Heroku Python Scrapy firefox error - python

hey guys i tried to deploy my scrapy project onto Heroku and everything was deployed. But when i try to execute heroku run scrapy crawl crawlername i get the error
RuntimeError: Could not find firefox in your system PATH. Please specify the firefox binary location or install firefox. i have added these lines to /etc/paths.
/Applications
/Applications/Firefox.app
/Applications/Firefox.app/Contents/MacOS/firefox-bin
/Applications/Firefox.app/Contents/MacOS/firefox
After adding them to etc/paths i am able to run firefox from the terminal by just typing firefox and firefox will open. But when i try to rerun heroku run scrapy crawl crawlername i get the same error.
self.driver = webdriver.Firefox()
self.driver.implicitly_wait(10)
thats inside my scrapy script to open the web browser.
if im in the same directory that i have created the heroku app and i run scrapy crawl crawlername everything works fine and i get my results but when i try to do heroku run scrapy crawl crawlername i get the error regarding firefox shown above. Any help will be greatly appreciated thanks.

As heinst mentions Firefox isn't installed on Heroku. This means that you have to have Heroku not on your local machine but on the Heroku instance which is running your script.
The lines you've added to /etc/paths are of your Mac -- not the Heroku instance you are trying to run your application on.
Alternatively try to use something like this buildpack. This requires some additional tools like Xvbf to have your Firefox an in-memory buffer to let Selenium click the right buttons and so on.
Alternatively: why not using Scrapy without Selenium? Most of the tasks can be done without any browser-interaction but you have to search of course a bit more to find your solution.

Related

How to know the Chrome version on Heroku

I am using the Heroku Chrome buildpack and am wondering what version it is running. I am trying to use it with a webpage but the webpage tells me I can't use it because the site needs chrome version 60 or above.
However the chromedriver used with it is version 103 something.
Is there some update that needs to happen or is the code not going to work?
To start with you need to manually spin-off the google-chrome browser and accessing chrome://settings/help you need to check the browser version as follows and update the browser version if required:
Once your program successfully initiates the Google Chrome browsing context you can extract the browser version on the run, programatically.

PhantomJS path on Heroku

I have a node app running on Heroku. I am scraping a website using selenium in python and calling the python script from my node app whenever I need to. I installed PhantomJS on my mac and when I run the app locally (node index.js), everything works just fine.
path_to_phantom = '/Users/govind/Desktop/phantomjs-2.1.1-
macosx/bin/phantomjs'
browser = webdriver.PhantomJS(executable_path = path_to_phantom)
However, nothing seems to work on Heroku. I also added the PhantomJS buildpack to my node app but it just doesn't call the python script. The problem I think is the path to PhantomJS buildpack. What path should I add? Or is there any other aspect I'm missing here?
I managed to use Selenium with PhantomJS in my Python application deployed to Heroku following these steps:
1) Switch to using the Cedar-14 stack on my Heroku application
$ heroku stack:set cedar-14
2) Install a PhantomJS buildpack
$ heroku buildpacks:add https://github.com/stomita/heroku-buildpack-phantomjs
With these changes I could then use Selenium to fetch websites
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get("http://www.google.com") # This does not throw an exception if it got a 404
html = browser.page_source
print html # If this outputs more than just '<html><head></head><body></body></html>' you know that it worked

Permission denied when trying to open Firefox to run Selenium tests via Django's manage.py

I'm running on Ubuntu 16.04.
I've been dealing with this issue for a while now, and I have not been able to find a solution on my own. When I run python3 manage.py test, my tests are to open a Firefox browser via Selenium and execute some functional tests. I get the same error message on every test I try to run;
selenium.common.exceptions.WebDriverException: Message: Failed to start browser /home/spa/Desktop/firefox: permission denied
I have tried to do chmod a+rwx on the firefox folder, but I still get the same error. Attempting to sudo python3 manage.py test results in the same issue. Any help would be appreciated.
I have had a similar problem on my Mac when starting firefox using a binary path. What i did to fix it is, i gave the binary path to the exact file needed to start firefox (usually a shell script inside the firefox folder)
There have been some problems with opening browsers from folders as seen here so this was the only way I found how to fix the problem.
There is also 1 more reason i could think of, why it would not work, there might be an incompatibility between your selenium and the firefox you are using. What is the version and what's the actual code you use?

Running chromedriver Through Django Selenium Testing

It seems that a lof of people are having trouble getting Selenium to find chromedriver, so this may apply to them aswell if they actually have the chromedriver.exe in the correct path.
It seems I have everything I need to have these selenium tests working, and when I manually try running the following 2 lines, everything works fine (it finds chromedriver and opens Chrome).
from selenium import webdriver
webdriver.Chrome()
However, when I put the exact same code into a Django test and try running the test through Django, I get a "ChromeDriver executable needs to be available in the path" error. I've tried re-installing Django and Selenium to no success.
Any help would be appreciated!
The problem was actually caused by enabling Celery tasks.

Cannot create browser process when using selenium from python on RHEL5

I'm trying to use selenium from python but I'm having a problem running it on a RHEL5.5 server. I don't seem to be able to really start firefox.
from selenium import webdriver
b = webdriver.Firefox()
On my laptop with ubuntu this works fine and it starts a brings up a firefox window. When I log in to the server with ssh I can run firefox from the command line and get it displayed on my laptop. It is clearly firefox from the server since it has the RHEL5.5 home page.
When I run the python script above on the server it (or run it in ipython) the script hangs at webdriver.Firefox()
I have also tried
from selenium import webdriver
fb = webdriver.FirefoxProfile()
fb.native_events_enabled=True
b=webdriver.Firefox(fb)
Which also hangs on the final line there.
I'm using python2.7 installed in /opt/python2.7. In installed selenium with /opt/python2.7/pip-2.7.
I can see the firefox process on the server with top and it is using a lot of CPU. I can also see from /proc/#/environ that the DISPLAY is set to localhost:10.0 which seems right.
How can I get a browser started with selenium on RHEL5.5? How can I figure out why Firefox is not starting?
It looks like the problem I'm encountering is this selenium bug:
http://code.google.com/p/selenium/issues/detail?id=2852
I used the fix described in comment #9 http://code.google.com/p/selenium/issues/detail?id=2852#c9
That worked for me.

Categories