How to properly close a Selenium WebDriver instance in python? - python

I am writing a script that will check if the proxy is working. The program should:
1. Load the proxy from the list (txt).
2. Go to any page (for example wikipedia)
3. If the page has loaded (even not completely) it saves the proxy data to another txt file.
It must all be in the loop. It must also check whether the browser has displayed an error. I have a problem with always turning off the previous browser every time, after several loops several browsers are already open.
Ps. I replaced the iteration with a random number
from selenium import webdriver
import random
from configparser import ConfigParser
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import traceback
while 1:
ini = ConfigParser()
ini.read('liczba_proxy.ini')
random.random()
liczba_losowa = random.randint(1, 999)
f = open('user-agents.txt')
lines = f.readlines()
user_agent = lines[liczba_losowa]
user_agent = str(user_agent)
s = open('proxy_list.txt')
proxy = s.readlines()
i = ini.getint('liczba', 'liczba')
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy[liczba_losowa])
chrome_options.add_argument(f'user-agent={user_agent}')
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path='C:\Python\Driver\chromedriver.exe')
driver.get('https://en.wikipedia.org/wiki/Replication_error_phenotype')
def error_catching():
print("error")
driver.stop_client()
traceback.print_stack()
traceback.print_exc()
return False
def presence_of_element(driver, timeout=5):
try:
w = WebDriverWait(driver, timeout)
w.until(EC.presence_of_element_located((By.ID, 'siteNotice')))
print('work')
driver.stop_client()
return True
except:
print('not working')
driver.stop_client()
error_catching()

Without commenting on your code design:
In order to close a driver instance, use driver.close() or driver.quit() instead of your driver.stop_client().
The first one closes the the browser window on which the focus is set.
The second one basically closes all the browser windows and ends the WebDriver session gracefully.

Use
chrome_options.quit()
Obs.: Im pretty sure you should not use testcases like that... "while 1"? so you test will never end?
I guess you should setup your testes in TestCases and call the TheSuite to teste all your testcases and give you one feedback about whant pass or not, and maybe setup one cronjob to keep calling it by time to time.
Here one simple example mine using test cases with django and splinter (splinter is build on top of selenium)
https://github.com/Diegow3b/python-django-basictestcase/blob/master/myApp/tests/test_views.py

Related

Selenium Chrome (Python): Browser freezes - timeout and quit browser in this case

I use Selenium Chrome to extract information from online sources. Baically, I loop over a list of URLs (stored in mylinks) and load the webpages in the browser as follows:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("window-size=1200,800")
browser = webdriver.Chrome(chrome_options=options)
browser.implicitly_wait(30)
for x in mylinks:
try:
browser.get(x)
soup = BeautifulSoup(browser.page_source, "html.parser")
city = soup.find("div", {"class": "city"}).text
except:
continue
My problem is, that the browser "freezes" at some point. I know that this problem is caused by the webpage. As a consequence, my routine stops since the browser does not work any more. Also browser.implicitly_wait(30) does not help here. Neither explicit or implicit wait solves the problem.
I want to "timeout" the problem, meaning that I want to quit() the browser after x seconds (in case the browser freezes) and restart it.
I know that I could use a subprocess with timeout like:
def startprocess(filepath, waitingtime):
p = subprocess.Popen("C://mypath//" + filepath)
try:
p.wait(waitingtime)
except subprocess.TimeoutExpired:
p.kill()
However, for my task this solution would be second-best.
Question: is there an alternative way to timeout the browser.get(x) step in the loop above (in case the browser freezes) and to continue to the next step?

Unable to click on signs on a map

I've written a script in Python in association with selenium to click on each of the signs available in a map. However, when I execute my script, it throws timeout exception error upon reaching this line wait.until(EC.staleness_of(item)).
Before hitting that line, the script should have clicked once but It could not? How can I click on all the signs in that map cyclically?
This is the site link.
This is my code so far (perhaps, I'm trying with the wrong selectors):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.findapetwash.com/"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 15)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#map .gm-style"))):
item.click()
wait.until(EC.staleness_of(item))
driver.quit()
Signs visible on that map like:
Post script: I know that this is their API https://www.findapetwash.com/api/locations/getAll/ using which I can get the JSON content but I would like to stick to the Selenium way. Thanks.
I know you wrote you don't want to use the API but using Selenium to get the locations from the map markers seems a bit overkill for this, instead, why not making a call to their Web service using requests and parse the returned json?
Here is a working script:
import requests
import json
api_url='https://www.findapetwash.com/api/locations/getAll/'
class Location:
def __init__(self, json):
self.id=json['id']
self.user_id=json['user_id']
self.name=json['name']
self.address=json['address']
self.zipcode=json['zipcode']
self.lat=json['lat']
self.lng=json['lng']
self.price_range=json['price_range']
self.photo='https://www.findapetwash.com' + json['photo']
def get_locations():
locations = []
response = requests.get(api_url)
if response.ok:
result_json = json.loads(response.text)
for location_json in result_json['locations']:
locations.append(Location(location_json))
return locations
else:
print('Error loading locations')
return False
if __name__ == '__main__':
locations = get_locations()
for l in locations:
print(l.name)
Selenium
If you still want to go the Selenium way, instead of waiting until all the elements are loaded, you could just halt the script for some seconds or even a minute to make sure everything is loaded, this should fix the timeout exception:
import time
driver.get(link)
# Wait 20 seconds
time.sleep(20)
For other possible workarounds, see the accepted answer here: Make Selenium wait 10 seconds
You can click one by one using Selenium if, for some reasons, you cannot use API. Also it is possible to extract information for each sign without clicking on them with Selenium.
Here code to click one by one:
signs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.marker.marker--list")))
for sign in signs:
driver.execute_script("arguments[0].click();", sign)
#do something
Try also without wait, probably will work.

Selenium : Python script taking a lot of time in Quora [duplicate]

So I'm trying to login to Quora using Python and then scrape some stuff.
I'm using Selenium to login to the site. Here's my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
driver.close()
Now the questions:
It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?
When it did login, how do I make sure there were no errors? In other words, how do I check the response code?
How do I save cookies with selenium so I can continue scraping once I login?
If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn't have an API)
I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.
Now, I know those elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.
I have been there, selenium is slow. It may not be as slow as 4 min to fill a form. I then started using phantomjs, which is much faster than firefox, since it is headless. You can simply replace Firefox() with PhantomJS() in the webdriver line after installing latest phantomjs.
To check that you have login you can assert for some element which is displayed after login.
As long as you do not quit your driver, cookies will be available to follow links
You can try using urllib and post directly to the login link. You can use cookiejar to save cookies. You can even simply save cookie, after all, a cookie is simply a string in http header
You can fasten your form filling by using your own setAttribute method, here is code for java for it
public void setAttribute(By locator, String attribute, String value) {
((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
+ "',arguments[1]);",
getElement(locator),
value);
}
Running the web driver headlessly should improve its execution speed to some degree.
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)
browser.get('https://google.com/')
browser.close()
For Windows 7 and IEDRIVER with Python Selenium, Ending the Windows Command Line and restarting it cured my issue.
I was having trouble with find_element..clicks. They were taking 30 seconds plus a little bit. Here's the type of code I have including capturing how long to run.
timeStamp = time.time()
elem = driver.find_element_by_css_selector(clickDown).click()
print("1 took:",time.time() - timeStamp)
timeStamp = time.time()
elem = driver.find_element_by_id("cSelect32").click()
print("2 took:",time.time() - timeStamp)
That was recording about 31 seconds for each click. After ending the command line and restarting it (which does end any IEDRIVERSERVER.exe processes), it was 1 second per click.
I have changed locators and this works fast. Also, I have added working with cookies. Check the code below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pickle
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
wait = WebDriverWait(driver, 5)
username = wait.until(EC.presence_of_element_located((By.XPATH, '//div[#class="login"]//input[#name="email"]')))
password = wait.until(EC.presence_of_element_located((By.XPATH, '//div[#class="login"]//input[#name="password"]')))
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
wait.until(EC.presence_of_element_located((By.XPATH, '//span[text()="Add Question"]'))) # checking that user logged in
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb")) # saving cookies
driver.close()
We have saved cookies and now we will apply them in a new browser:
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)
driver.get('http://www.quora.com/')
Hope, this will help.

Is Selenium slow, or is my code wrong?

So I'm trying to login to Quora using Python and then scrape some stuff.
I'm using Selenium to login to the site. Here's my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
driver.close()
Now the questions:
It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?
When it did login, how do I make sure there were no errors? In other words, how do I check the response code?
How do I save cookies with selenium so I can continue scraping once I login?
If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn't have an API)
I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.
Now, I know those elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.
I have been there, selenium is slow. It may not be as slow as 4 min to fill a form. I then started using phantomjs, which is much faster than firefox, since it is headless. You can simply replace Firefox() with PhantomJS() in the webdriver line after installing latest phantomjs.
To check that you have login you can assert for some element which is displayed after login.
As long as you do not quit your driver, cookies will be available to follow links
You can try using urllib and post directly to the login link. You can use cookiejar to save cookies. You can even simply save cookie, after all, a cookie is simply a string in http header
You can fasten your form filling by using your own setAttribute method, here is code for java for it
public void setAttribute(By locator, String attribute, String value) {
((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
+ "',arguments[1]);",
getElement(locator),
value);
}
Running the web driver headlessly should improve its execution speed to some degree.
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)
browser.get('https://google.com/')
browser.close()
For Windows 7 and IEDRIVER with Python Selenium, Ending the Windows Command Line and restarting it cured my issue.
I was having trouble with find_element..clicks. They were taking 30 seconds plus a little bit. Here's the type of code I have including capturing how long to run.
timeStamp = time.time()
elem = driver.find_element_by_css_selector(clickDown).click()
print("1 took:",time.time() - timeStamp)
timeStamp = time.time()
elem = driver.find_element_by_id("cSelect32").click()
print("2 took:",time.time() - timeStamp)
That was recording about 31 seconds for each click. After ending the command line and restarting it (which does end any IEDRIVERSERVER.exe processes), it was 1 second per click.
I have changed locators and this works fast. Also, I have added working with cookies. Check the code below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pickle
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
wait = WebDriverWait(driver, 5)
username = wait.until(EC.presence_of_element_located((By.XPATH, '//div[#class="login"]//input[#name="email"]')))
password = wait.until(EC.presence_of_element_located((By.XPATH, '//div[#class="login"]//input[#name="password"]')))
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
wait.until(EC.presence_of_element_located((By.XPATH, '//span[text()="Add Question"]'))) # checking that user logged in
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb")) # saving cookies
driver.close()
We have saved cookies and now we will apply them in a new browser:
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)
driver.get('http://www.quora.com/')
Hope, this will help.

How can I make Selenium/Python wait for the user to login before continuing to run?

I'm trying to run a script in Selenium/Python that requires logins at different points before the rest of the script can run. Is there any way for me to tell the script to pause and wait at the login screen for the user to manually enter a username and password (maybe something that waits for the page title to change before continuing the script).
This is my code so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import unittest, time, re, getpass
driver = webdriver.Firefox()
driver.get("https://www.facebook.com/")
someVariable = getpass.getpass("Press Enter after You are done logging in")
driver.find_element_by_xpath('//*[#id="profile_pic_welcome_688052538"]').click()
Use WebDriverWait. For example, this performs a google search and then waits for a certain element to be present before printing the result:
import contextlib
import selenium.webdriver as webdriver
import selenium.webdriver.support.ui as ui
with contextlib.closing(webdriver.Firefox()) as driver:
driver.get('http://www.google.com')
wait = ui.WebDriverWait(driver, 10) # timeout after 10 seconds
inputElement = driver.find_element_by_name('q')
inputElement.send_keys('python')
inputElement.submit()
results = wait.until(lambda driver: driver.find_elements_by_class_name('g'))
for result in results:
print(result.text)
print('-'*80)
wait.until will either return the result of the lambda function, or a selenium.common.exceptions.TimeoutException if the lambda function continues to return a Falsey value after 10 seconds.
You can find a little more information on WebDriverWait in the Selenium book.
from selenium import webdriver
import getpass # < -- IMPORT THIS
def loginUser():
# Open your browser, and point it to the login page
someVariable = getpass.getpass("Press Enter after You are done logging in") #< THIS IS THE SECOND PART
#Here is where you put the rest of the code you want to execute
THEN whenever you want to run the script, you type loginUser() and it does its thing
this works because getpass.getpass() works exactly like input(), except it doesnt show any characthers ( its for accepting passwords and notshowing it to everyone looking at the screen)
So what happens is your page loads up. then everything stops, Your user manually logs in, and then goes back to the python CLI and hits enter.

Categories