I have a custom application written in python on Ubuntu. It's a bit hairy to unwind all the pieces to get to a reduced question to ask (will post more if I get there), but I have a few things to ennumerate. After trial-and-error, I have backed this problem off to just firefox 14.
Things were fine on firefox 13, firefox 14 was updated on Ubuntu, and stuff broke. (this is not uncommon, but I can't find this problem referenced anywhere yet)
We go to a page in our webservice and reload, 10 or so times, and then the reload hangs, spinning with "Connecting" in the status bar.
Connections on Firefox are getting consumed by XHRs. Increasing the max connection setting in firefox works around the issue. Basically we open up an XHR that in chrome, I can't even see, but in firefox shows with a spinner in firebug. That XHR seems to stay open across page reloads, and eventually consumes the open connections to the site.
After a couple minutes or so, a connection frees up and the load goes through.
Has anyone seen this? Is there a proper way to release the connection? All other browsers tried are not having this problem.
Thanks!
I have many tests in my rails application that worked ok before I updated to firefox 14.01. After that, Firefox browsers opens and just hangs there. I had to switch to Chrome (downloaded driver from Google). If of any help, this is how I initialize driver in ruby:
#driver = Selenium::WebDriver.for :chrome, :switches => %w[--ignore-certificate-errors --disable-popup-blocking --disable-translate]
Upgrading to Firefox 15 beta has solved the problem. If I find anything in the FF release notes, I'll update the answer.
There is now a Firefox bug to track this issue.
Related
I'm trying to scrape a website that contains judicial information of my country (Colombia). I have a python script that uses Selenium to open the website and later insert a process number:
pathDriver = 'yourpathdriver'
driver = webdriver.Chrome(executable_path=pathDriver)
url = 'https://consultaprocesos.ramajudicial.gov.co/Procesos/NumeroRadicacion'
driver.get(url)
However the script only works the first time is executed, in later executions I get this error:
selenium.common.exceptions.WebDriverException: Message: unknown error: net::ERR_CONNECTION_CLOSED
I have to wait about 30 minutes to try the script again, but the result is the same, only works the first time.
I've tried to open the browser with the --incognito flag but this doesn't work. Also, I've tried to find a way to send request headers with Selenium but it seems this feature is not supported.
I am using Windows 10 and ChromeDriver.
Is there any Selenium tip to overcome this issue?
Thanks
When I have seen this error, it was a network issue (site not accessible from internal company network). To confirm or exclude this, try to run the tests from a computer outside your company, for example, your home computer. Here are more suggestions, but some of them are advanced (dangerous) and you should execute them only if you know what you are doing.
Additionally, the site is loaded on my computer for more than 20 seconds and in the console, I see the error:
GET https://consultaprocesos.ramajudicial.gov.co/js/chunk-3b114a7f.921eecf3.js net::ERR_CONNECTION_TIMED_OUT
However, this does not seem to cause the observed behavior.
Another possible reason could be an outdated browser/WebDriver or incorrect disposal (quit()) of the driver. If the issue is not reproduced manually (opening the site without Selenium), you can try with another WebDriver. You are using Chrome, so try with Firefox.
I am having some really hard times trying to figure out how to webscrape making multiple requests to the same website. I have to web scrape 3000 products from a website. That implies making various requests to that server (for example searching the product, clicking on it, going back to the home page) 3000 times.
I state that I am using Selenium. If I only launch one instance of my Firefox webdriver I don't get a MaxRetryError, but as the search goes on my webdriver gets slower and slower, and when the program reaches about half of the searches it stops responding. I looked it up on some forums and I found out it does so for some browser memory issues. So I tried quitting and reinstantiating the webdriver every n seconds (I tried with 100, 200 and 300 secs), but when I do so I get that MaxRetryError because of the too many requests to that url using the same session.
I then tried making the program sleep for a minute when the exception occurs but that hasn't worked (I am only able to make another search and then an exception is again thrown, and so on).
I am wondering if there is any workaround for these kind of issue.
It might be using another library, a way for changing IP or session dynamically or something like that.
P.S. I would rather keep working with selenium if possible.
This error is normally raised if the server determines a high request rate from your client.
As you mentioned, the server bans your IP from making further requests so you can get around that by using some available technologies. Look into Zalenium and also see here for some other possible ways.
Another possible (but tedious) way is to use a number of browser instances to make the call, for example, an answer from here illustrates that.
urlArr = ['https://link1', 'https://link2', '...']
for url in urlArr:
chrome_options = Options()
chromedriver = webdriver.Chrome(executable_path='C:/Users/andre/Downloads/chromedriver_win32/chromedriver.exe', options=chrome_options)
with chromedriver as browser:
browser.get(url)
# your task
chromedriver.close() # will close only the current chrome window.
browser.quit() # should close all of the open windows,
I am making a simple bot with selenium that will like, comment and message people on certain intervals.
I am using chrome web driver:
browser = webdriver.Chrome()
Also, I am on a x64 linux system. Distro is ubuntu 15.04 and am running with python3 from terminal.
and this works good and all, but it's pretty slow. I know as my code progresses, testing the app will become a pain. I've looked into this already and know it may have something to do with the proxy settings.
I am clueless when it comes to this type of stuff.
I fiddled with my system settings and changed my proxy settings to not require a connection, but nothing changed.
I notice when the driver loads, I see 'Establishing secure connection' for a few seconds in the browser window. I feel this is a culprit.
Also, 'establishing host' shows up multiple times. I'd say it takes about 5-8 seconds just to get a page.
login_url = 'http://www.skout.com/login'
browser.get(login_url)
In what ways can I speed up chrome driver, and is it proxy settings? It could definitely be something else.
Thanks for your time.
Chrome webdriver can be clunky and a bit slow to initialize as it is spawning a fresh instance every time you call the Webdriver object.
If speed is of the utmost importance I might recommend investing some time into looking at a headless alternative such as PhantomJS. This can save a significant amount of time if you are running multiple tests or instances of your application.
I am trying to scrape data from the URLs below. But selenium fails when driver.get(url) Some times the error is [Errno 104] Connection reset by peer, sometimes [Errno 111] Connection refused. On rare days it works just fine and on my mac with real browser the same spider works fine every single time. So this isn't related to my spider.
Have tried many solutions like waiting got selectors on page, implicit wait, using selenium-requests yo pass proper request headers, etc. But nothing seems to work.
http://www.snapdeal.com/offers/deal-of-the-day
https://paytm.com/shop/g/paytm-home/exclusive-discount-deals
I am using python, selenium & headless Firefox webdriver to achieve this. The os is centos 6.5.
Note: I have many AJAX heavy pages that gets scraped successfully some are below.
http://www.infibeam.com/deal-of-the-day.html, http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals
Already spent many days trying to debug the issue with no luck. Any help would be appreciated.
After days of jingling around this issue, finally found the cause. Writing it here for the benefit of the community. The headless browser was failing due to lack of RAM on the server, strange error messages from webdriver were real pita.
The server was running straight up for 60 days without reboot, Rebooting it did the trick. After increasing the swap by 3 times, has not faced issue for past few days. Also scheduled a task to cleanup page file caches (http://www.yourownlinux.com/2013/10/how-to-free-up-release-unused-cached-memory-in-linux.html).
Found this question while looking for similar error.
Look's like it's a selenium 3.8.1 and 3.9.0 bug.
https://github.com/SeleniumHQ/selenium/issues/5296
Downgrade to 3.8.0 solves this problem
I have been using Selenium and chromedriver (python3) for scraping purposes for some time now. With the latest Google Chrome update I had to deal with two issues.
1) Error on webdriver launch:
Solution: I had to add "no-sandbox" argument.
chrome_options.add_argument('--no-sandbox')
2) [Errno 104] Connection reset by peer:
Solution. There seems to be a problem with sockets and http requests. Either the webpage content is too big or you don't give the page enough time to load. At least that's what I thought.
I set the maximum page load time to 60 seconds and it seems to be working fine.
driver.set_page_load_timeout(60)
I added a small delay between webdrivers initialisations which also seems to help.
time.sleep(0.5)
For the IE webdriver, it opens the IE browsers but it starts to load the local host and then stops (ie/ It never stated loading ). WHen the browser stops loading it shows the msg 'Initial start page for webdriver server'. The problem is that this does not occur every time I execute the test case making it difficult to identify what could be the cause of the issue. What I have noticed is when this issue occurs, the url will take ~25 secs to load manually on the same machine. When the issue does not occur, the URL will load within 3secs.
All security setting are the same (protected Mode enabled across all zone)
enhance protected mode is disabled
IE version 11
the URL is added as a trusted site.
Any clue why it does not load the URL sometimes?
I would try with disabling IE Native event. And, sorry that I cannot provide you the Python syntax right a way. The following is C# which should be fairly easy to convert.
var ieOptions = new InternetExplorerOptions
{ EnableNativeEvents = false };
ieOptions.EnsureCleanSession = true;
driver = new InternetExplorerDriver(ieOptions);
Use remote driver with desired cap (pageLoadStrategy)
Release notes from seleniumhq.org. Note that we have to use version 2.46 for the jar, iedriverserver.exe and python client driver in order to have things work correctly. It is unclear why 2.45 does not work given the release notes below.
v2.45.0.2
Updates to JavaScript automation atoms.
Added pageLoadStrategy to IE driver. Setting a capability named
pageLoadStrategy when creating a session with the IE driver will now change
the wait behavior when navigating to a new page. The valid values are:
"normal" - Waits for document.readyState to be 'complete'. This is the
default, and is the same behavior as all previous versions of
the IE driver.
"eager" - Will abort the wait when document.readyState is
'interactive' instead of waiting for 'complete'.
"none" - Will abort the wait immediately, without waiting for any of
the page to load.
Setting the capability to an invalid value will result in use of the
"normal" page load strategy.
It hasn't been updated for a while, but recently I had very similar issue - IEDriverServer was eventually opening page under test, but in most cases just stuck on Initial page of WebDriver.
What I found the root cause (in my case) was startup setting of IE. I had Start with tabs from the last session enabled, when changed back to Start with home page driver started to work like a charm, opening page under test in 100% of tries.