python: Selenium webdriver and hanging proxy issue - python

I am trying to understand how to handle cases when calls which are perfomed through proxies are hanging. For example I have this code:
def call_with_proxy(ip, port):
profile = FirefoxProfile()
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', ip)
profile.set_preference('network.proxy.socks_port', port)
profile.update_preferences()
driver= webdriver.Firefox(profile)
driver.get("http://somewebsite.com")
The proxy is taken from free proxies list here https://hidemyass.com/proxy-list/
Some times everythig works and I am getting the page I am requesting. But sometimes I am getting a blank firefox page (where I can see some elements of the website is being loaded, e.g. css), and this process lasts for a very long time. E.g. session is not being closed even after 10 minutes of such waiting time. I want to ask if there is a way, to automatically close browser if for example page is not loading for some time, or for example the test I am performing stopped execution (due to some reason related to proxies)

In java we have:
webDriver.manage().timeouts().pageLoadTimeout(30, TimeUnit.SECONDS);
From the doc:
pageLoadTimeout
WebDriver.Timeouts pageLoadTimeout(long time,
java.util.concurrent.TimeUnit unit)
Sets the amount of time to wait for a page load to complete before
throwing an error. If the timeout is negative, page loads can be
indefinite.
Parameters:
time - The timeout value.
unit - The unit of time. Returns:
A Timeouts interface.
Quick Googling shows:
webDriver.set_page_load_timeout(30)
for Python. Try this in try-catch (or try-except in your case)

Implement a heartbeat system using queues or other active runtime objects (ie/ weblistener). If you know the maximum runtime for the site script as a whole, you can use SE-Grid like functionality.
If you have variable times on the site, and are only worried about the initial load time, a heartbeat system is the only way I can think of.

Related

(Selenium) Running many firefox browser with less performance [duplicate]

I am using selenium with Firefox to automate some tasks on Instagram. It basically goes back and forth between user profiles and notifications page and does tasks based on what it finds.
It has one infinite loop that makes sure that the task keeps on going. I have sleep() function every few steps but the memory usage keeps increasing. I have something like this in Python:
while(True):
expected_conditions()
...doTask()
driver.back()
expected_conditions()
...doAnotherTask()
driver.forward()
expected_conditions()
I never close the driver because that will slow down the program by a lot as it has a lot of queries to process. Is there any way to keep the memory usage from increasing overtime without closing or quitting the driver?
EDIT: Added explicit conditions but that did not help either. I am using headless mode of Firefox.
Well, This the serious problem I've been going through for some days. But I have found the solution. You can add some flags to optimize your memory usage.
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument('--no-sandbox')
options.add_argument('--disable-application-cache')
options.add_argument('--disable-gpu')
options.add_argument("--disable-dev-shm-usage")
These are the flags I added. Before I added the flags RAM usage kept increasing after it crosses 4GB (8GB my machine) my machine stuck. after I added these flags memory usage didn't cross 500MB. And as DebanjanB answers, if you running for loop or while loop tries to put some seconds sleep after each execution it will give some time to kill the unused thread.
To start with Selenium have very little control over the amount of RAM used by Firefox. As you mentioned the Browser Client i.e. Mozilla goes back and forth between user profiles and notifications page on Instagram and does tasks based on what it finds is too broad as a single usecase. So, the first and foremost task would be to break up the infinite loop pertaining to your usecase into smaller Tests.
time.sleep()
Inducing time.sleep() virtually puts a blanket over the underlying issue. However while using Selenium and WebDriver to execute tests through your Automation Framework, using time.sleep() without any specific condition defeats the purpose of automation and should be avoided at any cost. As per the documentation:
time.sleep(secs) suspends the execution of the current thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.
You can find a detailed discussion in How to sleep webdriver in python for milliseconds
Analysis
There were previous instances when Firefox consumed about 80% of the RAM.
However as per this discussion some of the users feels that the more memory is used the better because it means you don't have RAM wasted. Firefox uses RAM to make its processes faster since application data is transferred much faster in RAM.
Solution
You can implement either/all of the generic/specific steps as follows:
Upgrade Selenium to current levels Version 3.141.59.
Upgrade GeckoDriver to GeckoDriver v0.24.0 level.
Upgrade Firefox version to Firefox v65.0.2 levels.
Clean your Project Workspace through your IDE and Rebuild your project with required dependencies only.
If your base Web Client version is too old, then uninstall it and install a recent GA and released version of Web Client.
Some extensions allow you to block such unnecessary content, as an example:
uBlock Origin allows you to hide ads on websites.
NoScript allows you to selectively enable and disable all scripts running on websites.
To open the Firefox client with an extension you can download the extension i.e. the XPI file from https://addons.mozilla.org and use the add_extension(extension='webdriver.xpi') method to add the extension in a FirefoxProfile as follows:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.add_extension(extension='extension_name.xpi')
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'C:\path\to\geckodriver.exe')
If your Tests doesn't requires the CSS you can disable the CSS following the this discussion.
Use Explicit Waits or Implicit Waits.
Use driver.quit() to close all
the browser windows and terminate the WebDriver session because if
you do not use quit() at the end of the program, the WebDriver
session will not be closed properly and the files will not be cleared
off memory. And this may result in memory leak errors.
Creating new firefox profile and use it every time while running test cases in Firefox shall eventually increase the performance of execution as without doing so always new profile would be created and caching information would be done there and if driver.quit does not get called somehow before failure then in this case, every time we end up having new profiles created with some cached information which would be consuming memory.
// ------------ Creating a new firefox profile -------------------
1. If Firefox is open, close Firefox.
2. Press Windows +R on the keyboard. A Run dialog will open.
3. In the Run dialog box, type in firefox.exe -P
Note: You can use -P or -ProfileManager(either one should work).
4. Click OK.
5. Create a new profile and sets its location to the RAM Drive.
// ----------- Associating Firefox profile -------------------
ProfilesIni profile = new ProfilesIni();
FirefoxProfile myprofile = profile.getProfile("automation_profile");
WebDriver driver = new FirefoxDriver(myprofile);
Please share execution performance with community if you plan to implement this way.
There is no fix for that as of now.
I suggest you use driver.close() approach.
I was also struggling with the RAM issue and what i did was i counted the number of loops and when the loop count reached to a certain number( for me it was 200) i called driver.close() and then start the driver back again and also reset the count.
This way i did not need to close the driver every time the loop is executed and has less effect on the performance too.
Try this. Maybe it will help in your case too.

How can I debug an iOS Selenium test in Python

I'm trying to run an iOS Selenium test in debug mode. I'm using Appium, an iOS simulator (Xcode), and writing the tests in Python.
Once the code reach my breakpoint I can see all the variables, but few seconds later, instead of seeing their values I get the following exception:
A session is either terminated or not started
This is happening even though I can see the simulator is still running.
I've tried looking online but couldn't find a solution, Can you please help?
Thanks!
You might want to increase newCommandTimeout Desired Capability value to something which will allow you to inspect the elements values. The relevant code line to increase the timeout to 5 minutes would be:
desired_caps['newCommandTimeout'] = '300'
Full initialization routine just in case:
from appium import webdriver
desired_caps = {}
desired_caps['platformName'] = 'iOS'
desired_caps['platformVersion'] = '12.3'
desired_caps['automationName'] = 'xcuitest'
desired_caps['deviceName'] = 'iPhone SE'
desired_caps['newCommandTimeout'] = '300'
driver = webdriver.Remote('http://localhost:4723/wd/hub', desired_caps)
This way Appium will wait for a new command from the client (your code) for 5 minutes prior to considering the client idle and terminating the session, it should be enough to enable debugging, feel free to increase more if needed.
You can also consider switching to Appium Studio which makes your life easier when it comes to inspecting the mobile layout, managing iOS devices/provisioning profiles, generating unique XPath locators for elements automatically and having an extra set of Desired Capabilities allowing you to faster deal with edge cases

Runaway memory usage with Selenium using PhantomJS

I have written a script in Python that iterates over a long list of webpages and gathers data, using Selenium and PhantomJS as the webdriver (since I'm running it on a remote terminal machine running Linux, and needed to use a headless browser). For short jobs, e.g. where it has to iterate over a few pages, there are no issues. However, for longer jobs, where it has to iterate through a longer list of pages, I see the memory usage increase dramatically over time, each time a new page is loaded. Eventually after about 20 odd pages the script is killed due to memory overflow.
Here is how I initialize my browser -
from selenium import webdriver
url = 'http://someurl.com/'
browser = webdriver.PhantomJS()
browser.get(url)
The page has next buttons and I iterate through the pages by finding the xpath for the 'Next >' button -
next_xpath = "//*[contains(text(), 'Next >')]"
next_link = browser.find_element_by_xpath(next_xpath)
next_link.click()
I have tried clearing cookies and cache for the PhantomJS browser in the following ways -
browser.get('javascript:localStorage.clear();')
browser.get('javascript:sessionStorage.clear();')
browser.delete_all_cookies()
However none of these has had any impact on memory usage. When I use the Firefox driver, on my local machine it works without any issues, though it should be noted that my local machine has much more memory than the remote server.
My apologies if any crucial information is missing. Please feel free to let me know how I can make my question more comprehensive.
If Headless browser is working for you, I would like to suggest a solution which helped me solve similar memory issue.
Use AWS Lamda as your server, use parallel execution library xdist and bingo you would never face a memory issue as Lamada is managed service, Make sure to upload your captured data to S3 and clean up temp directory on lamda. (I have already implemented this in one of my project and works like a charm)

Explicit Wait not timing out consistently in Selenium Python

I have the following code using Selenium in Python 3:
profile = webdriver.FirefoxProfile()
profile.set_preference('webdriver.load.strategy', 'unstable')
browser = webdriver.Firefox(profile)
browser.set_page_load_timeout(10)
url = 'my_url'
while True:
try:
st = time.time()
browser.get(url)
print('Finished get!')
time.sleep(2)
wait = WebDriverWait(browser, 10)
element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[my_attr="my_attr"]')))
print('Success after {} seconds.'.format(round(time.time()-st)))
break
except:
print('Timed out after {} seconds.'.format(round(time.time()-st)))
print('Reloading')
continue
From my understanding, using the explicit wait here (even with the unstable load strategy and page load timeout), what should happen is that the page should load, it should look for the element specified, and if either the page doesn't load within 10 seconds or the element is not found within 10ish seconds, it should time out and reload again (because of the try/except clause with the while loop).
However, what I'm finding is that it's not timing out consistently. For example, I've had instances where the loading times out after 10ish seconds the first time around, but once it reloads, it doesn't time out and instead "succeeds" after like 140 seconds. Or sometimes it doesn't time out at all and just keeps running until it succeeds. Because of the unstable load strategy, I don't think the page load itself is ever timing out (more specifically, the 'Finished get!' message always prints). But the explicit wait here that I specified also does not seem to be consistent. Is there something in my code that is overriding the timeouts? I want the timeouts to be consistent such that if either the page doesn't load or the element isn't located within 10ish seconds, I want it to timeout and reload. I don't ever want it to go on for 100+ seconds, even if it succeeds.
Note that I'm using the unstable webdriver load strategy here because the page I'm going to takes forever to completely load so I want to go straight through the code once the elements I need are found without needing the entire page to finish loading.
After some more testing, I have located the source of the problem. It's not that the waits are not working. The problem is that all the time is being taken up by the locator. I discovered this by essentially writing my own wait function and using the .find_element_by_css_selector() method, which is where all the runtime is occurring when it takes 100+ seconds. Because of the nature of my locator and the complexity of the page source, it's taking 100+ seconds sometimes for the locator to find the element when the page is nearly fully loaded. The locator time is not factored into the wait time. I presume that the only "solution" to this is to write a more efficient locator.

Reload time & retries in selenium for a url

I am working on selenium with python for downloading file from a url.
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
try:
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
except:
pass
I want to set timeout for this program. Means, If within 60 seconds if this url is not loaded due to net issue, it should retry after each 60 seconds and after 3 tries, it should go ahead.
How can I achieve such in this code?
Thanks
You could use browser.implicitly_wait(60)
WebDriver.implicitly_wait
There is nothing built in to do this. However, I wouldn't have said it would be too hard.
Just use an explicit wait to find a particular element that should be there when the page loads. Set the timeout to be 60 seconds on this explicit wait.
Wrap this in a loop that executes up to three times. To avoid it running three times unnecessarily, put in a break statement when the explicit wait actually runs without any issue.
That means it'll run up to three times, waiting 60 seconds a time, and once it's successful it'll exit the loop. If it isn't successful after all of that, then it'll crash.
Note: I've not actually tried this but it's just a logical solution!

Categories