how to find element asap, do not wait selenium pageloaded [duplicate] - python

This question already has answers here:
How to make Selenium not wait till full page load, which has a slow script?
(2 answers)
Closed 3 years ago.
i want to use selenium to find element asap when the DOMcontentLoad
how canfindElement execute do not wait until the page loaded?
var webdriver = require('selenium-webdriver'),
By = webdriver.By,
until = webdriver.until
;(async function main(){
driver =await new webdriver.Builder().forBrowser('chrome').build()
await driver.get('some url')//wait until it throw timeout error
ele=await driver.findElement(By.id('username'))
ele.sendKeys('xxx')
})()
i try to use
await driver.manage().setTimeouts({pageLoad:3e3,script:2e3})
but after catch errors, all promises are timeouted
environment
nodejs
"selenium-webdriver": "^4.0.0-alpha.1"
chromedriver 73.0.3683.20
finally, my nodejs solution:
var {Options} = require('selenium-webdriver/chrome'),
{Builder,By,until,Capabilities}=require('selenium-webdriver'),
driver;
;(async function main(){
driver =await new Builder()
.withCapabilities(
Options.chrome().setPageLoadStrategy('none')
).build()
})()

Selenium web driver supports three page-load strategies.
normal
This stategy causes Selenium to wait for the full page loading (html content and subresources downloaded and parsed).
eager
This stategy causes Selenium to wait for the DOMContentLoaded event (html content downloaded and parsed only).
none
This strategy causes Selenium to return immediately after the initial page content is fully received (html content downloaded).
You have to set your page load strategy to eager. Although chrome doesn't support 'eager', you can set it to 'none'. Then you need to synchronize driver wait for some elements.
Python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
option = Options()
option.set_capability("pageLoadStrategy","eager")
driver = webdriver.Chrome(executable_path="chromedriver.exe",options=option)

Related

Blocking downloads on Chrome using Selenium with Python

I made a simple bot on python using selenium that researches a certain content in a website and returns a specific result to the user. However, that website has an ad that every time I click on a page element, or click ENTER to advance (or the bot clicks in this case) it downloads a file. I only found out about this after running a few tests while improving the bot. I tried doing it manually and the same thing happened, so it's a problem with the website itself.
But I'm guessing there's a way to completely block the downloads, because it's saving that file automatically. I don't think it makes that much difference, but this is what triggers the download:
driver.find_element(By.ID, "hero-search").send_keys(Keys.ENTER)
And I can't go around that because I need to advance to the next page. So, is there a way to block this on selenium?
You can block the downloads by using the chrome preferences:
from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {
"download.prompt_for_download", false,
"download_restrictions": 3,
}
options.add_experimental_option(
"prefs", prefs
)
driver = webdriver.Chrome(
options=options
)
driver.get(
"https://www.an_url.com"
)
driver.close()

Selenium Long Page Load in Chrome [duplicate]

This question already has answers here:
How to make Selenium not wait till full page load, which has a slow script?
(2 answers)
Closed 3 years ago.
I have built a scraper in python 3.6 using selenium and scrapinghub crawlera. I am trying to fetch this car and download its photos. https://www.cars.com/vehicledetail/detail/800885995/overview/ but the page just keep loading for long periods of time. What I am trying to figure out is how can I stop the browser from continuously loading after 4 mins.
I have tried both explicit and implicit wait and none has worked.
driver = webdriver.Chrome('/usr/bin/chromedriver',
desired_capabilities=capabilities,
options=chrome_options)
driver.implicitly_wait(180)
driver.get(url)
You need to set the max waiting time for loading with driver.set_page_load_timeout().
In case the page passes its loading time, the browser will throw a TimeoutException. All you need to do is to take care of it
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome('/usr/bin/chromedriver',
desired_capabilities=capabilities,
options=chrome_options)
driver.set_page_load_timeout(time_to_wait)
try:
driver.get(url)
except TimeoutException:
# Do what you need here

Selenium / Python switch webdriver from headless to window mode [duplicate]

This question already has answers here:
How do I make Chrome Headless after I login manually
(2 answers)
Closed 4 years ago.
Is there any way to switch Chrome webdriver from headless mode to window mode?
One thing that came to my mind is to 'switch' existing web driver to non-headless mode.
Another idea: to create new instance of webdriver (this time non-headless) with some sort of 'state' from old one so the user operations can be executed. I don't know how to do or if it is possible though.
import os
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException,
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(
executable_path=os.path.join(os.getcwd(), 'chromedriver'),
chrome_options=options,
)
driver.get('https://website.com')
try:
driver.find_element_by_xpath('//h1').click()
except NoSuchElementException:
print('You have to click it manually')
# here I need Chrome browser
# to be opened so that I can click a link
print('The name of this thing is: ', end='')
print(driver.find_element_by_xpath("//h1[#class='name']").text)
If you need to open a new tab
driver.execute_script("window.open()")
If you need to switch to this new one
driver.switch_to.window(self.driver.window_handles[1])
Then you get the page
driver.get('https://website.com')
and the end you can close it (the new one)
driver.close()
and you back to the first driver
switch_to.window(driver.window_handles[0])

Selenium : Python script taking a lot of time in Quora [duplicate]

So I'm trying to login to Quora using Python and then scrape some stuff.
I'm using Selenium to login to the site. Here's my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
driver.close()
Now the questions:
It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?
When it did login, how do I make sure there were no errors? In other words, how do I check the response code?
How do I save cookies with selenium so I can continue scraping once I login?
If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn't have an API)
I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.
Now, I know those elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.
I have been there, selenium is slow. It may not be as slow as 4 min to fill a form. I then started using phantomjs, which is much faster than firefox, since it is headless. You can simply replace Firefox() with PhantomJS() in the webdriver line after installing latest phantomjs.
To check that you have login you can assert for some element which is displayed after login.
As long as you do not quit your driver, cookies will be available to follow links
You can try using urllib and post directly to the login link. You can use cookiejar to save cookies. You can even simply save cookie, after all, a cookie is simply a string in http header
You can fasten your form filling by using your own setAttribute method, here is code for java for it
public void setAttribute(By locator, String attribute, String value) {
((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
+ "',arguments[1]);",
getElement(locator),
value);
}
Running the web driver headlessly should improve its execution speed to some degree.
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)
browser.get('https://google.com/')
browser.close()
For Windows 7 and IEDRIVER with Python Selenium, Ending the Windows Command Line and restarting it cured my issue.
I was having trouble with find_element..clicks. They were taking 30 seconds plus a little bit. Here's the type of code I have including capturing how long to run.
timeStamp = time.time()
elem = driver.find_element_by_css_selector(clickDown).click()
print("1 took:",time.time() - timeStamp)
timeStamp = time.time()
elem = driver.find_element_by_id("cSelect32").click()
print("2 took:",time.time() - timeStamp)
That was recording about 31 seconds for each click. After ending the command line and restarting it (which does end any IEDRIVERSERVER.exe processes), it was 1 second per click.
I have changed locators and this works fast. Also, I have added working with cookies. Check the code below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pickle
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
wait = WebDriverWait(driver, 5)
username = wait.until(EC.presence_of_element_located((By.XPATH, '//div[#class="login"]//input[#name="email"]')))
password = wait.until(EC.presence_of_element_located((By.XPATH, '//div[#class="login"]//input[#name="password"]')))
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
wait.until(EC.presence_of_element_located((By.XPATH, '//span[text()="Add Question"]'))) # checking that user logged in
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb")) # saving cookies
driver.close()
We have saved cookies and now we will apply them in a new browser:
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)
driver.get('http://www.quora.com/')
Hope, this will help.

How to close the browser after completing a download?

How to make browser closed after completing download?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get(any_url)
browser.find_elements_by_xpath('//input[#value="Download"]').click()
# The program start downloading now.
# HERE WHAT IS THE CODE?
browser.quit()
I want to close the browser only after completing the download.
You may want to use the below piece of code right before you close the browser.
time.sleep(5)# Gives time to complete the task before closing the browser. You
may increase the seconds to 10 or 15,basically the amount of time
required for download otherwise it goes to the next step
immediately.
browser.quit()
You can use the pause command:
pause ( waitTime )
Wait for the specified amount of time (in milliseconds)
http://release.seleniumhq.org/selenium-core/1.0/reference.html#pause
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get(any_url)
browser.find_elements_by_xpath('//input[#value="Download"]').click()
# The program start downloading now.
pause (10000) # pause/sleeps for 10 seconds
browser.quit()
This is an alternative way I did on C#. Maybe you can use the same technique and apply it on python.
public static string GetRequest(string url, bool isBinary = false) {
// binary is the file that will be downloaded
// Here you perform asynchronous get request and download the binary
// Python guide for GetRequest -> http://docs.python-requests.org/en/latest/user/quickstart/
}
browser.webdriver.Firefox();
browser.get(any_url);
elem = browser.findElement("locator");
GetRequest(elem.getAttribute('href'), true); // when this method is done, you expect the get request is done
browser.quit();
The trick that I used was to open the download manager page and expect by one element that indicate that the download is finished. Follow the Python code used:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
...
# Wait until the download finish. This code just works for one single download at time on Firefox.
# browser.execute_script('window.open();')
# ActionChains(browser).key_down(Keys.COMMAND).send_keys('t').key_up(Keys.COMMAND).perform()
browser.get('about:downloads')
# files = browser.find_elements_by_class_name('download-state')
WebDriverWait(browser, URL_LOAD_TIMEOUT).until(EC.presence_of_element_located((By.CLASS_NAME, 'downloadIconShow')))
# 'downloadIconCancel'
browser.close()
broswer.quit()
The problem of this approach is that it may be dependent of Firefox version, if Mozilla change that download manager page.

Categories