So basically I have a list of URLs. I want to open each URL using webdriver simultaneously so the task can be achieved in a short span of time (instead of looping through each URL in the list).
Should I use Selenium Grid or is there a simpler way?
My code looks as follows:
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import time
from selenium.webdriver.common.keys import Keys
list = ['www.link1.com', 'www.link2.com','www.link3.com'....]
for i in list:
driver2 = webdriver.Chrome()
driver2.get(i)
time.sleep(1)
try:
finallinks = []
all_links = driver2.find_elements(By.XPATH, "/html/body/main/section[1]/div/section[2]/div/div[1]/div/div/div[1]/div/div/section/main/div[2]/form/div[2]/div/div/a")
print("HOLAAAAAA")
for a in all_links:
if str(a.get_attribute('href')).startswith("https://something/view") and a.get_attribute(
'href') not in finallinks:
finallinks.append(a.get_attribute('href'))
print(finallinks)
except NoSuchElementException:
print("Didn't exist")
if you want to run multiple instances of your webdriver tests in parallel, you can use Selenium Grid. It enables you to distribute your tests across multiple machines and run them in parallel, which can significantly reduce the time it takes to complete a suite of tests.
However, if you are working on a small scale, you can also use multi-threading or multi-processing to run multiple webdriver instances simultaneously in your code. This approach may be simpler than setting up a Selenium Grid, but it will not scale as well if you need to run tests on many machines or if you need to run tests in different environments.
Selenium Grid
Selenium Grid allows the execution of automation scripts on remote machines by routing commands sent by the client to remote browser instances using the WebDriver. Selenium Grid enables us to run tests in parallel on multiple machines and also allows testing on different browser versions enabling cross platform testing.
This usecase
A lot depends on the size of the list of urls.
Case A: In case of 5-10 urls:
list = ['www.link1.com', 'www.link2.com','www.link3.com'....'www.link10.com']
I would still go with a single node, just to save the cost of maintaining the Selenium Grid
Case B: In case of more then 10 urls:
list = ['www.link1.com', 'www.link2.com','www.link3.com'....'www.link20.com']
It would be adviseable to implement Selenium Grid and distribute the urls among the available Selenium Grid Nodes as follows:
Node 1:
list = ['www.link1.com', 'www.link2.com','www.link3.com'....'www.link10.com']
Node 2:
list = ['www.link11.com', 'www.link12.com','www.link13.com'....'www.link20.com']
Related
Here is the code I have so far. My next step is taking the right elements from the website ie. the names of the most recent articles and putting them in a list.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
PATH = "C:\webdrivers"
driver = webdriver.Chrome()
driver.get("https://www.cnbc.com/business/")
This is what you should do:
from selenium import webdriver
from selenium.webdriver import ActionChains
PATH = "/Users/Samuel/PycharmProjects/MoneyMachine/drivers/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get("https://www.cnbc.com/business/")
action = ActionChains(driver)
list = []
for i in range(2):
element = driver.find_element_by_xpath(f"/html/body/div[2]/div/div[1]/div[3]/div/div/div/div[3]/div[1]/div[1]/section/div/div[1]/div[{i+1}]/div/div/div/div[1]/div/a/div").text
list.append(element)
for i in range(3):
element = driver.find_element_by_xpath(f"/html/body/div[2]/div/div[1]/div[3]/div/div/div/div[3]/div[1]/div[1]/section/div/div[2]/div[{i+1}]/div/div/div/div[1]/div/a/div").text
list.append(element)
driver.close()
print(list)
driver.find_element_by_xpath("XPATH") finds an element for you. To know what you should put into the quotes right-click the element you want, and select inspect. Then when you hover over the element in your inspect window right click and press copy full xpath.
I think you should check out BeautifulSoup (BS4) for this sort of project, think it will be better for your case. BS4 is more user friendly.
Here are some more reasons you should use BS4 for this project:
"
Bandwidth, and time to run your script. Using Selenium means fetching all the resources that would normally be fetched when you visit a page in a browser - stylesheets, scripts, images, and so on. This is probably unnecessary.
Stability and ease of error recovery. Selenium can be a little fragile, in my experience - even with PhantomJS - and creating the architecture to kill a hung Selenium instance and create a new one is a little more irritating than setting up simple retry-on-exception logic when using requests.
Potentially, CPU and memory usage - depending upon the site you're crawling, and how many spider threads you're trying to run in parallel, it's conceivable that either DOM layout logic or JavaScript execution could get pretty expensive.
"
From - Selenium versus BeautifulSoup for web scraping
Edit: There are other questions addressing the ability to interact with pages that aren't fully loaded. THIS IS NOT THAT. This is specific to the SeleniumWire driver, not just Selenium Webdriver.
I'm currently working with a project using Selenium with Chromedriver in Python 3.8, which requires manipulating a page that takes a very long time to load. As such, I'm using the page loading strategy 'eager' options.page_load_strategy = 'eager' in order to be able to manipulate certain elements of the page before it loads fully.
I set up a test that measures the time for an element to be clicked after the browser has been declared. (Effectively measuring how long the page loading is taking to the point where a constant button can be clicked). When I used the regular Selenium Webdriver, running 15 tests got me an average time of 0.7352 seconds. However, when I used the SeleniumWire Webdriver (with the only change being the change in the difference of Webdriver), my load times after 15 tests were on average 4.3745. These load times were on par as when I ran this test on Selenium Webdriver using the 'normal' (or default) page load strategy which after 15 tests were on average 4.3900.
Thus, I believe that SeleniumWire is not utilizing the page load strategy and I was looking for possible solutions. How can I make sure that SeleniumWire uses eager loading?
I am trying to write a python program that visits a site hosted by me for testing and development purposes, I would like to use Microsoft Edge browser only for this task but I can't seem to figure out how to run the script headless so that it would consume fewer resources, I did my research but Edge browser does not seem to have a headless option as far as I understand correct me if I am wrong please, so is there any way to size the browser window that pops up on my screen to zero dimensions, if so does it use fewer resources since there is nothing to render?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#from selenium.webdriver.edge.options import Options
# setting up headless option for faster execution
#options = Options()
#options.headless = True
engine = webdriver.Edge()
#engine = webdriver.Edge(executable_path="path/to/executable")
engine.get('my_site_link_goes_here')
assert 'title' in engine.title
print ("Done")
when I run this script everything works as expected but I would like to make it close to headless as much as possible
ps: I can only use Edge browser for testing
Selenium webdriver does not provide any method for minimizing the browser, there is no such direct method. You need to use resize method to minimize the browser.
Dimension d = new Dimension(300,1080);
#Resize current window to the set dimension
driver.manage().window().setSize(d);
I would like to run my script on Multiple browser using selenium.
As of now I am able to perform the operation by opening one browser at a time.
Eg:- Register to amazon.
I want to be able to Register two users to amazon at the same time.
This is the code I have as of now.
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
driver.get("https://www.amazon.com/ap/register?openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.amazon.com%2F%3Fref_%3Dnav_signin&prevRID=VBHFJ50CPKFJ3PGG7RDY&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=usflex&openid.mode=checkid_setup&openid.ns.pape=http%3A%2F%2Fspecs.openid.net%2Fextensions%2Fpape%2F1.0&prepopulatedLoginId=&failedSignInCount=0&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&pageId=usflex&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0")
driver.find_element_by_xpath("""//*[#id="s2id_ID_form4a8055de_guest_register_sponsor_lookup"]/a/span[2]/b""").click()
driver.find_element_by_xpath("""//*[#id="s2id_autogen1_search"]""").send_keys(v1)
By using this I can run it for one user at one time. But I want to be able to register more than two users upto n users at the same time.
Hence, the multiple windows questions.
You could create multiple instances of the webdriver. You can then manipulate each individually. For example,
from selenium import webdriver
driver1 = webdriver.Chrome()
driver2 = webdriver.Chrome()
driver1.get("http://google.com")
driver2.get("http://yahoo.com")
This question is a bit old at this point, but I still found it applicable to something I was having trouble with today.
In order to achieve parallel processes you need to utilize multiprocessing. Essentially, this allows you to create browser instances for each function and allow each script to lock to each browser GIL separately. You can then start each of the processes in your main code and they will all execute in parallel.
If you need an explanation on how to do this, a great video can be found here
To run my functional tests i use LiveServerTestCase.
I want to call set_speed (and other methods, set_speed is just an example) that aren't in the webdriver, but are in the selenium object.
http://selenium.googlecode.com/git/docs/api/py/selenium/selenium.selenium.html#module-selenium.selenium
my subclass of LiveServerTestCase
from selenium import webdriver
class SeleniumLiveServerTestCase(LiveServerTestCase):
#classmethod
def setUpClass(cls):
cls.driver = webdriver.Firefox()
cls.driver.implicitly_wait(7)
cls.driver.maximize_window()
# how to call selenium.selenium.set_speed() from here? how to get the ref to the selenium object?
super(SeleniumLiveServerTestCase, cls).setUpClass()
How to get that? I can't call the constructor on selenium, i think.
You don't. Setting the speed in WebDriver is not possible and the reason for this is that you generally shouldn't need to, and the 'waiting' is now done at a different level.
Before it was possible to tell Selenium, don't run this at normal speed, run it at a slower speed to allow more things to be available on page load, for slow loading pages or AJAX'ified pages.
Now, you do away with that altogether. Example:
I have a login page, I login and once logged in I see a "Welcome" message. The problem is the Welcome message is not displayed instantly and is on a time delay (using jQuery).
Pre WebDriver Code would dictate to Selenium, run this test, but slow down here so we can wait until the Welcome message appears.
Newer WebDriver code would dictate to Selenium, run this test, but when we login, wait up to 20 seconds for the Welcome Message to appearing, using explicit waits.
Now, if you really want access to "set" Selenium's speed, first off I'd recommend against it but the solution would be to dive into the older, now deprecated code.
If you use WebDriver heavily already, you can use the WebDriverBackedSelenium which can give you access to the older Selenium methods, whilst keeping the WebDriver backing the same, therefore much of your code would stay the same.
https://groups.google.com/forum/#!topic/selenium-users/6E53jIIT0TE
Second option is to dive into the old Selenium code and use it, this will change a lot of your existing code (because it is before the "WebDriver" concept was born).
The code for both Selenium RC & WebDriverBackedSelenium lives here, for the curious:
https://code.google.com/p/selenium/source/browse/py/selenium/selenium.py
Something along the lines of:
from selenium import webdriver
from selenium import selenium
driver = webdriver.Firefox()
sel = selenium('localhost', 4444, '*webdriver', 'http://www.google.com')
sel.start(driver = driver)
You'd then get access to do this:
sel.setSpeed(5000)