Selenium firefox instances freeze - python

I have a process running to scrape some information in bulk. It uses selenium and starts one firefox instance per URL to be scraped.
Similar processes run on multiple instances (to split workload). On one such instance, now I see processes freezing. They keep running for hours and hours. I have put no conditional waits in my selenium scripts. Other instances are running fine (which are also scraping the same website).
Selenium version - 3.0.2
Geckodriver version - 0.13.0
Firefox version - Mozilla Firefox 50.1.0
I tried going through the geckodriver logs but could not make sense of anything. Any ideas on how to debug this further?

Related

How to know the Chrome version on Heroku

I am using the Heroku Chrome buildpack and am wondering what version it is running. I am trying to use it with a webpage but the webpage tells me I can't use it because the site needs chrome version 60 or above.
However the chromedriver used with it is version 103 something.
Is there some update that needs to happen or is the code not going to work?
To start with you need to manually spin-off the google-chrome browser and accessing chrome://settings/help you need to check the browser version as follows and update the browser version if required:
Once your program successfully initiates the Google Chrome browsing context you can extract the browser version on the run, programatically.

How to install Selenium (python) on a Apache Web Server?

I have up and running an Apache Server with Python 3.x installed already on it. Right now I am trying to run ON the server a little python program (let's say filename.py). But this python program uses the webdriver for Chrome from Selenium. Also it uses sleep from time (but I think this comes by default, so I figure it won't be a problem)
from selenium import webdriver
When I code this program for the first time on my computer, not only I had to write the line of code above but also to manually download the webdriver for Chrome and paste it on /usr/local/bin. Here is the link to the file in case you wonder: Webdriver for Chorme
Anyway, I do not know what the equivalences are to configure this on my server. Do you have any idea how to do it? Or any concepts I could learn related to installing packages on an Apache Server?
Simple solution:
You don't need to install the driver in usr/local/bin. You can have the .exe anywhere and you can specify that with an executable path, see here for an example.
Solution for running on a server
If you have python installed on the server, ideally >3.4 which comes with pip as default. Then install ChromeDriver on a standalone server, follow the instructions here
Note that, Selenium always need an instance of a browser to control.
Luckily, there are browsers out there that aren't that heavy as the usual browsers you know. You don't have to open IE / Firefox / Chrome / Opera. You can use HtmlUnitDriver which controls HTMLUnit - a headless Java browser that does not have any UI. Or a PhantomJsDriver which drives PhantomJS - another headless browser running on WebKit.
Those headless browsers are much less memory-heavy, usually are faster (since they don't have to render anything), they don't require a graphical interface to be available for the computer they run at and are therefore easily usable server-side.
Sample code of headless setup
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
It's also worth reading on running Selenium RC, see here on that.

Phantomjs / Splinter - Issue with cache

I have an EC2 ubuntu instance where I have planned a script twice a day.
The script uses Splinter Python lib with PhantomJs headless browser to test some button and actions on my website.
I have just noticed that my T1.micro instance is slower and slower, until my script is not launching anymore.
Run du on my instance and found that Phantomjs takes a lot of memory on my disk.
Can I remove thoses files?
How can I prevent this stack of files ?
Can't find anything related on Splinter nor Phantomjs.
Thanks!

Selenium webdriver support for the latest versions of firefox and chrome

I am using selenium-2.35.0 and Python-2.7.
Testcases are written in python.
my python code to create driver object:
from selenium import webdriver
driver = webdriver.Remote(desired_capabilities={
"browserName": "firefox"
})
And run selenium server by,
java - jar selenium-server-standalone-2.35.0.jar
I had my code working in Firefox - 22 - had the selenium server running, able to run scripts in python, etc. So I'm confident the code works.
Recently, I updated FireFox to 23 and now all I get is
"[Errno 10061] No connection could be made because the target machine actively refused it."
I thought maybe I need to restart the server again, or something. But that seems to do nothing. Is this issue related to selenium webdriver's support for the latest browser version?
But as of this link http://selenium.googlecode.com/git/java/CHANGELOG , selenium supports Firefox - 23. If supported, code that run in Firefox - 22 should also run in Firefox - 23 without any code change.
And how can i make the same code work for chrome?
I have found that the newest version of firefox routinely doens't work immediately well with Selenium. Check out this firefox support matrix on Github that someone made. Unfortunately the only thing you can do is stop Firefox from auto-updating and keep your selenium tests running for firefox newest version minus 1 or 2. Chrome tends to work out of the box for Selenium, sometimes the Beta channel has fixed some selenium issues, so try that if you have a particular issue (on the other hand it may introduce other bugs). So in the end you need to be constantly weary of browser updates and routinely checking how they are working with the current version of selenium.
Check out this guide on how to get Selenium working with rolled back versions of firefox:
http://inkhorn.ca/selenium-python-on-ubuntu-using-firefox/
It will also fix any errors that have to do with “version xul**.0 not defined in file libxul.so”

Cannot create browser process when using selenium from python on RHEL5

I'm trying to use selenium from python but I'm having a problem running it on a RHEL5.5 server. I don't seem to be able to really start firefox.
from selenium import webdriver
b = webdriver.Firefox()
On my laptop with ubuntu this works fine and it starts a brings up a firefox window. When I log in to the server with ssh I can run firefox from the command line and get it displayed on my laptop. It is clearly firefox from the server since it has the RHEL5.5 home page.
When I run the python script above on the server it (or run it in ipython) the script hangs at webdriver.Firefox()
I have also tried
from selenium import webdriver
fb = webdriver.FirefoxProfile()
fb.native_events_enabled=True
b=webdriver.Firefox(fb)
Which also hangs on the final line there.
I'm using python2.7 installed in /opt/python2.7. In installed selenium with /opt/python2.7/pip-2.7.
I can see the firefox process on the server with top and it is using a lot of CPU. I can also see from /proc/#/environ that the DISPLAY is set to localhost:10.0 which seems right.
How can I get a browser started with selenium on RHEL5.5? How can I figure out why Firefox is not starting?
It looks like the problem I'm encountering is this selenium bug:
http://code.google.com/p/selenium/issues/detail?id=2852
I used the fix described in comment #9 http://code.google.com/p/selenium/issues/detail?id=2852#c9
That worked for me.

Categories