How to handle elements inside Shadow DOM from Selenium - python

I want to automate file download completion checking in chromedriver.
HTML of each entry in downloads list looks like
<a is="action-link" id="file-link" tabindex="0" role="link" href="http://fileSource" class="">DownloadedFile#1</a>
So I use following code to find target elements:
driver.get('chrome://downloads/') # This page should be available for everyone who use Chrome browser
driver.find_elements_by_tag_name('a')
This returns empty list while there are 3 new downloads.
As I found out, only parent elements of #shadow-root (open) tag can be handled.
So How can I find elements inside this #shadow-root element?

Sometimes the shadow root elements are nested and the second shadow root is not visible in document root, but is available in its parent accessed shadow root. I think is better to use the selenium selectors and inject the script just to take the shadow root:
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
outer = expand_shadow_element(driver.find_element_by_css_selector("#test_button"))
inner = outer.find_element_by_id("inner_button")
inner.click()
To put this into perspective I just added a testable example with Chrome's download page, clicking the search button needs open 3 nested shadow root elements:
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
driver.get("chrome://downloads")
root1 = driver.find_element_by_tag_name('downloads-manager')
shadow_root1 = expand_shadow_element(root1)
root2 = shadow_root1.find_element_by_css_selector('downloads-toolbar')
shadow_root2 = expand_shadow_element(root2)
root3 = shadow_root2.find_element_by_css_selector('cr-search-field')
shadow_root3 = expand_shadow_element(root3)
search_button = shadow_root3.find_element_by_css_selector("#search-button")
search_button.click()
Doing the same approach suggested in the other answers has the drawback that it hard-codes the queries, is less readable and you cannot use the intermediary selections for other actions:
search_button = driver.execute_script('return document.querySelector("downloads-manager").shadowRoot.querySelector("downloads-toolbar").shadowRoot.querySelector("cr-search-field").shadowRoot.querySelector("#search-button")')
search_button.click()
later edit:
I recently try to access the content settings(see code below) and it has more than one shadow root elements imbricated now you cannot access one without first expanding the other, when you usually have also dynamic content and more than 3 shadow elements one into another it makes impossible automation. The answer above use to work a few time ago but is enough for just one element to change position and you need to always go with inspect element an ho up the tree an see if it is in a shadow root, automation nightmare.
Not only was hard to find just the content settings due to the shadowroots and dynamic change when you find the button is not clickable at this point.
driver = webdriver.Chrome()
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
driver.get("chrome://settings")
root1 = driver.find_element_by_tag_name('settings-ui')
shadow_root1 = expand_shadow_element(root1)
root2 = shadow_root1.find_element_by_css_selector('[page-name="Settings"]')
shadow_root2 = expand_shadow_element(root2)
root3 = shadow_root2.find_element_by_id('search')
shadow_root3 = expand_shadow_element(root3)
search_button = shadow_root3.find_element_by_id("searchTerm")
search_button.click()
text_area = shadow_root3.find_element_by_id('searchInput')
text_area.send_keys("content settings")
root0 = shadow_root1.find_element_by_id('main')
shadow_root0_s = expand_shadow_element(root0)
root1_p = shadow_root0_s.find_element_by_css_selector('settings-basic-page')
shadow_root1_p = expand_shadow_element(root1_p)
root1_s = shadow_root1_p.find_element_by_css_selector('settings-privacy-page')
shadow_root1_s = expand_shadow_element(root1_s)
content_settings_div = shadow_root1_s.find_element_by_css_selector('#site-settings-subpage-trigger')
content_settings = content_settings_div.find_element_by_css_selector("button")
content_settings.click()

There is also ready to use pyshadow pip module, which worked in my case, below example:
from pyshadow.main import Shadow
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
shadow = Shadow(driver)
element = shadow.find_element("#Selector_level1")
element1 = shadow.find_element("#Selector_level2")
element2 = shadow.find_element("#Selector_level3")
element3 = shadow.find_element("#Selector_level4")
element4 = shadow.find_element("#Selector_level5")
element5 = shadow.find_element('#control-button') #target selector
element5.click()

You can use the driver.executeScript() method to access the HTML elements and JavaScript objects in your web page.
In the exemple below, executeScript will return in a Promise the Node List of all <a> elements present in the Shadow tree of element which id is host. Then you can perform you assertion test:
it( 'check shadow root content', function ()
{
return driver.executeScript( function ()
{
return host.shadowRoot.querySelectorAll( 'a' ).then( function ( n )
{
return expect( n ).to.have.length( 3 )
}
} )
} )
Note: I don't know Python so I've used the JavaScript syntax but it should work the same way.

I would add this as a comment but I don't have enough reputation points--
The answers by Eduard Florinescu works well with the caveat that once you're inside a shadowRoot, you only have the selenium methods available that correspond to the available JS methods--mainly select by id.
To get around this I wrote a longer JS function in a python string and used native JS methods and attributes (find by id, children + indexing etc.) to get the element I ultimately needed.
You can use this method to also access shadowRoots of child elements and so on when the JS string is run using driver.execute_script()

With selenium 4.1 there's a new attribute shadow_root for the WebElement class.
From the docs:
Returns a shadow root of the element if there is one or an error. Only works from Chromium 96 onwards. Previous versions of Chromium based browsers will throw an assertion exception.
Returns:
ShadowRoot object or
NoSuchShadowRoot - if no shadow root was attached to element
A ShadowRoot object has the methods find_element and find_elements but they're currently limited to:
By.ID
By.CSS_SELECTOR
By.NAME
By.CLASS_NAME
Shadow roots and explicit waits
You can also combine that with WebdriverWait and expected_conditions to obtain a decent behaviour. The only caveat is that you must use EC that accept WebElement objects. At the moment it's just one of the following ones:
element_selection_state_to_be
element_to_be_clickable
element_to_be_selected
invisibility_of_element
staleness_of
visibility_of
Example
e.g. borrowing the example from eduard-florinescu
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
timeout = 10
driver.get("chrome://settings")
root1 = driver.find_element_by_tag_name('settings-ui')
shadow_root1 = root1.shadow_root
root2 = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root1.find_element(by=By.CSS_SELECTOR, value='[page-name="Settings"]')))
shadow_root2 = root2.shadow_root
root3 = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root2.find_element(by=By.ID, value='search')))
shadow_root3 = root3.shadow_root
search_button = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root3.find_element(by=By.ID, value="searchTerm")))
search_button.click()
text_area = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root3.find_element(by=By.ID, value='searchInput')))
text_area.send_keys("content settings")
root0 = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root1.find_element(by=By.ID, value='main')))
shadow_root0_s = root0.shadow_root
root1_p = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root0_s.find_element(by=By.CSS_SELECTOR, value='settings-basic-page')))
shadow_root1_p = root1_p.shadow_root
root1_s = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root1_p.find_element(by=By.CSS_SELECTOR, value='settings-privacy-page')))
shadow_root1_s = root1_s.shadow_root
content_settings_div = WebDriverWait(driver, timeout).until(EC.visibility_of(shadow_root1_s.find_element(by=By.CSS_SELECTOR, value='#site-settings-subpage-trigger')))
content_settings = WebDriverWait(driver, timeout).until(EC.visibility_of(content_settings_div.find_element(by=By.CSS_SELECTOR, value="button")))
content_settings.click()

I originally implemented Eduard's solution just slightly modified as a loop for simplicity. But when Chrome updated to 96.0.4664.45 selenium started returning a dict instead of a WebElement when calling 'return arguments[0].shadowRoot'.
I did a little hacking around and found out I could get Selenium to return a WebElement by calling return arguments[0].shadowRoot.querySelector("tag").
Here's what my final solution ended up looking like:
def get_balance_element(self):
# Loop through nested shadow root tags
tags = [
"tag2",
"tag3",
"tag4",
"tag5",
]
root = self.driver.find_element_by_tag_name("tag1")
for tag in tags:
root = self.expand_shadow_element(root, tag)
# Finally there. GOLD!
return [root]
def expand_shadow_element(self, element, tag):
shadow_root = self.driver.execute_script(
f'return arguments[0].shadowRoot.querySelector("{tag}")', element)
return shadow_root
Clean and simple, works for me.
Also, I could only get this working Selenium 3.141.0. 4.1 has a half baked shadow DOM implementation that just manages to break everything.

The downloaded items by google-chrome are within multiple #shadow-root (open).
Solution
To extract the contents of the table you have to use shadowRoot.querySelector() and you can use the following locator strategy:
Code Block:
driver = webdriver.Chrome(service=s, options=options)
driver.execute("get", {'url': 'chrome://downloads/'})
time.sleep(5)
download = driver.execute_script("""return document.querySelector('downloads-manager').shadowRoot.querySelector('downloads-item').shadowRoot.querySelector('a#file-link')""")
print(download.text)

Related

Pop-up saying element not interactable Selenium

There are two pop-ups: One that asks if you live in California and the second one looks like this:
Here is my code:
The second pop-up doesn't show up every time and when it doesn't the function works. When it does I get an element not interactable error and I don't know why. Here is the inspector for the second pop-up close-btn.
test_data = ['Los Angeles','San Ramon']
base_url = "https://www.hunterdouglas.com/locator"
def pop_up_one(base_url):
driver.get(base_url)
try:
submit_btn = WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[aria-label='No, I am not a California resident']")))
submit_btn.click()
time.sleep(5)
url = driver.current_url
submit_btn = WebDriverWait(driver,5).until(EC.presence_of_element_located((By.XPATH, "//*[contains(#class,'icon')]")))
print(submit_btn.text)
#submit_btn.click()
except Exception as t:
url = driver.current_url
print(t)
return url
else:
url = driver.current_url
print("second pop_up clicked")
return url
I have tried selecting by the aria-label, class_name, xpath, etc. the way I have it now shows that there is a selenium web element when I print just the element but it doesn't let me click it for some reason. Any direction appreciated. Thanks!
There are 41 elements on that page matching the //*[contains(#class,'icon')] XPath locator. At least the first element is not visible and not clickable, so when you trying to click this submit_btn element this gives element not interactable error.
In case this element is not always appearing you should use logic clicking element only in case the element appeared.
With the correct, unique locator you code can be something like this:
submit_btn = WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[aria-label='No, I am not a California resident']")))
submit_btn.click()
time.sleep(5)
url = driver.current_url
submit_btn = driver.find_elements_by_xpath('button[aria-label="Close"]')
if(submit_btn):
submit_btn[0].click()
Here I'm using find_elements_by_xpath, this returns a list of web elements. In case element was not found it will be an empty list, it is interpreted as Boolean False in Python.
In case the element is found we will click on the first element in the returned list which will obviously the desired element.
UPD
See the correct locator used here.
It can also be in XPath syntax as //button[#aria-label="Close"]

How can open href at python?

I want to open 'text' botton but I can't open it.
How can I do?
from selenium import webdriver
import time
driver = webdriver.Chrome(path)
driver.get("https://www.pdfescape.com/account/unregistered/")
time.sleep(0.5)
elem = driver.find_element_by_id("dStartNew").click()
time.sleep(0.5)
elem = driver.find_element_by_id("PdfNew_input_pc").click()
driver.find_element_by_css_selector("[value=\"2\"]").click()
#time.sleep(0.5)
elem = driver.find_element_by_id("PdfNew_input_ps").click()
driver.find_element_by_css_selector("[value=\"a4\"]").click()
#time.sleep(0.5)
driver.find_element_by_xpath('//*[#id="PdfNew_form"]/input[1]').click()
time.sleep(0.5)
driver.find_elements_by_class_name('Add-Text').click()
Problem code : driver.find_elements_by_class_name('Add-Text').click()
The element you wanted to click has no class name - it has the title "Add Text", also you are using find_elements instead of find_element which changes the result type.
So you could select it with:
driver.find_element_by_css_selector('a[title=\"Add Text\"]').click()
Also please let me add another improvement:
driver.find_element_by_xpath('//*[#id="PdfNew_form"]/input[1]').click()
could be much more stable (for the case of page changes) written as:
driver.find_element_by_css_selector('input[type=\"submit\"]').click()
driver.find_elements_by_class_name('Add-Text'),click()
Above code should return a list of elements and you can not click on list of elements
This element is not present in the DOM
To click we use '.' instead of ','

How to select a value from a drop-down using Selenium from a website with special setting- Python

Note: I particularly deal with this website
How can I use selenium with Python to get the reviews on this page to sort by 'Most recent'?
What I tried was:
driver.find_element_by_id('sort-order-dropdown').send_keys('Most recent')
from this didn't cause any error but didn't work.
Then I tried
from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_id('sort-order-dropdown'))
select.select_by_value('recent')
select.select_by_visible_text('Most recent')
select.select_by_index(1)
I've got: Message: Element <select id="sort-order-dropdown" class="a-native-dropdown" name=""> is not clickable at point (66.18333435058594,843.7999877929688) because another element <span class="a-dropdown-prompt"> obscures it
This one
element = driver.find_element_by_id('sort-order-dropdown')
element.click()
li = driver.find_elements_by_css_selector('#sort-order-dropdown > option:nth-child(2)')
li.click()
from this caused the same error msg
This one from this caused the same error also
Select(driver.find_element_by_id('sort-order-dropdown')).select_by_value('recent').click()
So, I'm curious to know if there is any way that I can select the reviews to sort from the most recent first.
Thank you
This worked for me using Java:
#Test
public void amazonTest() throws InterruptedException {
String URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
String menuSelector = ".a-dropdown-prompt";
String menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver.get(URL);
Thread.sleep(2000);
WebElement menu = driver.findElement(By.cssSelector(menuSelector));
menu.click();
List<WebElement> menuItem = driver.findElements(By.cssSelector(menuItemSelector));
menuItem.get(1).click();
}
You can reuse the element names and follow a similar path using Python.
The key points here are:
Click on the menu itself
Click on the second menu item
It is a better practice not to hard-code the item number but actually read the item names and select the correct one so it works even if the menu changes. This is just a note for future improvement.
EDIT
This is how the same can be done in Python.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
menuSelector = ".a-dropdown-prompt";
menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver = webdriver.Chrome()
driver.get(URL)
elem = driver.find_element_by_css_selector(menuSelector)
elem.click()
time.sleep(1)
elemItems = []
elemItems = driver.find_elements_by_css_selector(menuItemSelector)
elemItems[1].click()
time.sleep(5)
driver.close()
Just to keep in mind, css selectors are a better alternative to xpath as they are much faster, more robust and easier to read and change.
This is the simplified version of what I did to get the reviews sorted from the most recent ones. As "Eugene S" said above, the key point is to click on the button itself and select/click the desired item from the list. However, my Python code use XPath instead of selector.
# click on "Top rated" button
driver.find_element_by_xpath('//*[#id="a-autoid-4-announce"]').click()
# this one select the "Most recent"
driver.find_element_by_xpath('//*[#id="sort-order-dropdown_1"]').click()

How to scrape dynamic content with BS4 or Selenium (Python)?

I'm trying to scrape all the file paths in a Github repo from the File Finder page (example).
Beautiful Soup 4 fails to scrape the <tbody class="js-tree-finder-results js-navigation-container js-active-navigation-container"> element that wraps the list of file paths. I figured this is b/c bs4 can't scrape dynamic content, so I tried waiting for all the elements to load with Selenium:
driver = webdriver.PhantomJS()
driver.get("https://github.com/chrisspen/weka/find/master")
# Explicitly wait for the element to become present
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((
By.CSS_SELECTOR, "js-tree-finder-results.js-navigation-container.js-active-navigation-container"
))
)
# Pass the found element into the script
items = driver.execute_script('return arguments[0].innerHTML;', element)
print(items)
But it's still failing to find the element.
What am I doing wrong?
P.S.
The file paths can be grabbed easily via the JS console with the following script:
window.onload = function() {
var fileLinks = document.getElementsByClassName("css-truncate-target js-navigation-open js-tree-finder-path");
var files = [];
for (var i = 0; i < fileLinks.length; i++) {
files.push(fileLinks[i].innerHTML);
}
return files;
}
Edit:
My program requires the use of a headless browser, like PhantomJS.
The current version of PhantomJS can't capture Github content because of Github's Content-Security-Policy setting. This is a known bug and documented here. According to that issue page, downgrading to PhantomJS version 1.9.8 is a way around the problem.
Another solution would be to use chromedriver:
driver = webdriver.Chrome('chromedriver')
driver.get("https://github.com/chrisspen/weka/find/master")
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((
By.CLASS_NAME, "js-tree-finder-path"
))
)

Selenium Extraction Problems: Waits/Not Finding Elements

In both chrome and firefox, everything is fine up until I need to extract text. I get this error:
h3 = next(element for element in h3s if element.is_displayed())
StopIteration
I even added a fluent wait.
browser = webdriver.Firefox()
browser.get('https://www.voilanorbert.com/')
inputElement = browser.find_element_by_id("form-search-name")
inputElement.send_keys(leadslist[i][0])
inputElement = browser.find_element_by_id("form-search-domain")
inputElement.send_keys(leadslist[i][1])
searchbutton = browser.find_element_by_name("search")
searchbutton.click()
wait = WebDriverWait(browser, 20)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.results")))
wait2 = WebDriverWait(browser, 3000, poll_frequency=100, ignored_exceptions=[ElementNotVisibleException])
wait2.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h3.one")))
h3s = browser.find_elements_by_css_selector('h3.one')
h3 = next(element for element in h3s if element.is_displayed())
result = h3.text
I think its because its not actually extracting anything, so its just an empty list.
Some pictures that will probably help:
This is the before picture:
This is the after picture:
I need to extract what is in the "text-center displayed" class of the "result" class.
The answer is fairly simple, you just need a different selector when waiting for the search result.
The approach below (C#) works perfectly, it'll reduce your code with a few lines.
One "result DIV" becomes visible when the search is done. It is the only element with the "text-center displayed" class, so that's all your selector needs.
Once such a DIV is displayed, you know where to pinpoint the H3 element (it's a child of said DIV).
So simply wait for the below element to become visible after you've clicked the search button:
IWebElement headerResult = w.Until(ExpectedConditions.ElementIsVisible(By.CssSelector("div[class=\"text-center displayed\"] h3")));
string result = headerResult.Text;

Categories