I want to use selenium to loop over a few divs on a webpage and find the content of the divs
The basic setup of the webpage looks like this:
<html>
<div data-testid="property-card">
<div data-testid="title">content 1</div>
</div>
<div data-testid="property-card">
<div data-testid="title">content 2</div>
</div>
<div data-testid="property-card">
<div data-testid="title">content 3</div>
</div>
</html>
and here is my code:
def report_results(self):
hotel_boxes = self.find_elements(By.XPATH, '//div[#data-testid="property-card"]')
for hotel in hotel_boxes:
hotel_name = hotel.find_element(By.XPATH, '//div[#data-testid="title"]').get_attribute('innerHTML')
print(hotel_name)
However, the problem is that this only prints "content 1" for three times. What am I doing wrong here?
You are almost there, the only 1 thing you are missing is a dot . at the front of XPath expression.
It should be
hotel_name = hotel.find_element(By.XPATH, './/div[#data-testid="title"]').get_attribute('innerHTML')
When using '//div[#data-testid="title"]' XPath expression it will search for the matching locator from the top of the page until it finds the first match.
While when we have that dot . it means to start searching inside the current node i.e. inside the parent element hotel
So, your entire code can be:
def report_results(self):
hotel_boxes = self.find_elements(By.XPATH, '//div[#data-testid="property-card"]')
for hotel in hotel_boxes:
hotel_name = hotel.find_element(By.XPATH, './/div[#data-testid="title"]').get_attribute('innerHTML')
print(hotel_name)
As per the given HTML:
<html>
<div data-testid="property-card">
<div data-testid="title">content 1</div>
</div>
<div data-testid="property-card">
<div data-testid="title">content 2</div>
</div>
<div data-testid="property-card">
<div data-testid="title">content 3</div>
</div>
</html>
To print the innerText of the descendant <div> tags you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR and text attribute:
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div[data-testid='property-card'] > [data-testid='title']")])
Using XPATH and .get_attribute('innerHTML'):
print([my_elem..get_attribute('innerHTML') for my_elem in driver.find_elements(By.XPATH, "//div[#data-testid='property-card']/div[#data-testid='title']")])
Related
I am trying to navigate to a search box and send_keys with selenium python but completely stuck.
And here is the source code snippet:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<div id="LeftTreeFrame" class="leftNavBackground" >
<div class="ui-widget searchPanelContainer">
<div id="searchPanel" class="search-field-container search-field-container-margin">
<input type="text" doesntDirty id="Search" name="Search" class="search-text-field-left-tree-frame" NoHighlight="nohighlight"/>
<div class="search-field-icon-container">
<a id="searchlbl" href="#"><img src="../images/normal_search_u39.svg" title="Go To Page" /></a>
</div>
</div>
</div>
<div id='pageNavigation'>
<div id='ootbNavigationPage'></div>
<div id='favoriteNavigationPage'></div>
<div id='adminNavigationPage'></div>
<div id='navigationEmptyState' class="treeEmptyState">
<div class="message"></div>
</div>
</div>
<div class="navigation-view-mode-container">
<div class="box" onclick="renderModel(0)">
<button type="button">
<span class="svg-load ootb-icon" data-src="~/images/Reskin/ootb-icon.svg"></span>
</button>
</div>
<div class="star" onclick="renderModel(1)">
<button type="button">
<span class="svg-load star-icon" data-src="~/images/Reskin/star.svg"></span>
</button>
</div>
<div class="person" onclick="renderModel(2)">
<button type="button">
<span class="svg-load person-icon" data-src="~/images/Reskin/person-nav.svg"></span>
</button>
</div>
</div>
</div>
When I try to do
element = driver.find_element(By.XPATH, '//input[#name="Search"]')
element.send_keys('test')
I get error "selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable"
I have tried everything I can imagine, but cannot click the element or send keys.
Also, this page is a new page that opens after the last successful click. I first tried switching to this page by
#printing handles
handles = driver.window_handles
i=0
for handle in handles:
print(f"Handle {i}: {handle}\n")
i +=1
#after confirming new page is second handle via:
driver.switch_to.window(handles[1])
print(f" Title: {driver.title}")
print(f" Current url: {driver.current_url}")
print('\n')
#I can even find the tag I am looking for after switching to new window:
all_div_tags = driver.find_elements(By.TAG_NAME, "input")
for tag in all_div_tags:
print(f"Attribute name: {tag.get_attribute('name')}\n")
#but i cannot get to the search box. Thank you in advance!
Look at the html code, notice that //input[#name="Search"] is contained in an <iframe>. In order to select an element inside an iframe with find_element() you have first to switch to the iframe, as shown in the code
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "frmCode")))
element = driver.find_element(By.XPATH, '//input[#name="Search"]')
...
I have a HTML code like :
<div class="A">
<div class="B"></div>
<div class="B">
<div class="C"></div>
<div class="C">
<p class="D"> Element 1 </p>
<div class="C"></div>
</div>
</div>
<div class="A">
<div class="B"></div>
<div class="B">
<div class="C"></div>
<div class="C">
<p class="D"> Element 2 </p>
<div class="C"></div>
</div>
</div>
(this is an example, there is more class "A")
I want to extract the text "Element 2" with Python Selenium.
I tried a lot of things but always the same result : No such element: Unable to locate element...
I tried :
elem = driver.find_element_by_xpath("//div[#class='A:last-child']/p[#class='D']").text
same result...
Try this:
"(//div[#class='A']//p)[2]"
This should get the second instance of Class = "A" and then the p element beneath that
Try this xpath:
"(//div[#class='A']//p)[last()]"
The main problem with your xpath, I think, is that the single slash before the p element means to only look for direct children of the div. You want the double slash to find any descendant.
In this structure Xpath
(//div[#class="A"]//p[#class="D"])[2]
if this is a second hierarchy or
(//div[#class="A"]//p[#class="D"])[last()]
if it is a last should work
So here is the following HTML code
<div>
<div id='parent-1'>
<div classname="fiasd">
<div classname="ehuh">
<div classname ="target-me-1">
</dv>
</div>
</div>
</div>
<div id='parent-1'>
<div classname="fiasd">
<div classname="ehuh">
<div classname ="target-me-1">
</dv>
</div>
</div>
</div>
<div id='parent-1'>
<div classname="fiasd">
<div classname="ehuh">
<div classname ="target-me-1">
</dv>
</div>
</div>
</div>
</div>
MY APPROACH
I am using selenium to find all the elements with classname="parent-1" which gives me a list of 3 elements. Now what I want to do is target the element 'target-me-1' using the reference from the parent element.
As in I want to find 'target-me-1' element within and only within the specific 'parent-1' elements.
Is it possible to find an element only within a selected element?
elements found in browser are also searchable
from selenium import webdriver
URL = 'url'
browser = webdriver.Chrome()
browser.get(URL)
lst = browser.find_elements_by_id("parent 1")
for parent in lst:
target = parent.find_element_by_class_name("target-me-1")
# print(target.text)
I would like to get movie names available between "tracked_by" id to "buzz_off" id. I have already created a selector which can grab names after "tracked_by" id. However, my intention is to let the script do the parsing UNTIL it finds "buzz_off" id. The elements within which the names are:
html = '''
<div class="list">
<a id="allow" name="allow"></a>
<h4 class="cluster">Allow</h4>
<div class="base min">Sally</div>
<div class="base max">Blood Diamond</div>
<a id="tracked_by" name="tracked_by"></a>
<h4 class="cluster">Tracked by</h4>
<div class="base min">Gladiator</div>
<div class="base max">Troy</div>
<a id="buzz_off" name="buzz_off"></a>
<h4 class="cluster">Buzz-off</h4>
<div class="base min">Heat</div>
<div class="base max">Matrix</div>
</div>
'''
from lxml import html as htm
root = htm.fromstring(html)
for item in root.cssselect("a#tracked_by ~ div.base a"):
print(item.text)
The selector I've tried with (also mentioned in the above script):
a#tracked_by ~ div.base a
Results I'm having:
Gladiator
Troy
Heat
Matrix
Results I would like to get:
Gladiator
Troy
Btw, I would like to parse the names using this selector not to style.
this is a reference for css selectors. As you can see, it doesn't have any form of logic, as it is not a programming language. You'd have to use a while not loop in python and handle each element one at a time, or append them to a list.
HTML of page:
<form name="compareprd" action="">
<div class="gridBox product " id="quickLookItem-1">
<div class="gridItemTop">
</div>
</div>
<div class="gridBox product " id="quickLookItem-2">
<div class="gridItemTop">
</div>
</div>
<!-- many more like this. -->
I am using Beautiful soup to scrap a page. In that page I am able to get a form tag by its name.
tag = soup.find("form", {"name": "compareprd"})
Now I want to count all immediate child divs but not all nested divs.
Say for example there are 20 immediate divs inside form.
I tried :
len(tag.findChildren("div"))
But It gives 1500.
I think it gives all "div" inside "form" tag.
Any help appreciated.
You can use a single css selector form[name=compareprd] > div which will find div's that are immediate children of the form:
html = """<form name="compareprd" action="">
<div class="gridBox product " id="quickLookItem-1">
<div class="gridItemTop">
</div>
</div>
<div class="gridBox product " id="quickLookItem-2">
<div class="gridItemTop">
</div>
</div>
</form>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print(len(soup.select("form[name=compareprd] > div")))
Or as commented pass recursive=True but use find_all, findChildren goes back to the bs2 days and is only provided for backwards compatability.
len(tag.find_all("div", recursive=False)