I have a table of search results in Selenium browser and each search result is defined in Html like this:
<div class="item
itemWrapper
ItemPosition1
ItemMonitor
" data-position="1" data-it-name="NAME OF THE ITEM" data-it-category="Category" role="article">
<div class="item-image">
<a href="/some/link/" target="_blank" rel="noopener" class="itemRec">
<img src="https://some.jpg" alt="some name" class="img-responsive">
</a>
</div>
<h2 class="small-text item-title">
Link Text
</h2>
<div class="item-bottom">
<div class="pull-left item-price">
<span>999</span>
</div>
<div class="pull-right detail-link">
<a href="/link/to/detail" title="link title" class="detail"
Detail
</a>
</div>
</div>
</div>
I am able to find all webelements by classname = item.
elements = driver.find_elements_by_class_name("item")
I would need to iterate over elements and get their position, name and price to be able to click to one of them:
for e in elements:
position=e.get_attribute("data-position").value,
name=e.get_attribute("data-it-name").value,
price=e.find_element(By.CLASS_NAME,'item-price').value
but this does not work - get_attribute returns None and find_element does not find any child element
Can you please advise me how to get the "data-" atributes and child elements values correctly?
Whole code using Webbot:
import webbot
from selenium.webdriver.common.by import By
web = webbot.Browser()
web.go_to('www.***.cz')
web.type('bed', classname='header-search-form')
web.press(web.Key.ENTER)
elements = web.find_elements(classname="product-item")
for e in elements:
name = e.get_attribute("data-it-name").value
price = e.find_element(By.CLASS_NAME, 'item-price').value
print(name,price)
break
classname acts weirdly in webbot. You definitely are not getting a product item there:
In [56]: elements[0].get_attribute('outerHTML')
Out[56]: '\n\n\t\t\t\t\t\t<img src="https://s.favi.cz/static/frontend/_global/images/favi-logo/favi-logo.60d511aff13247dd52f15acf6bdf2af9.svg" role="banner">\n\n\t\t\t\t\t'
Works well with a CSS selector:
In [58]: elements = web.find_elements(css_selector=".product-item")
In [59]: elements[0].get_attribute('outerHTML')
Out[59]: '<div class="\n\t\t\tproduct-item\n\t\t\titemWrapper\n\t\t\tproductItemPosition1\n\t\t\tproductItemMonitor\n\t\t\tproductItemWrapper\n\t\t\tsendProductTransactionWrapper\n\t\t\t\t\t" data-position="1" data-pr-name="Moderní box spring postel Alvares 160x200, bílá" data-tr-id="04d62b60-9d00-4d1b-b03c-2258c50bfdb9" data-pr-category="Postele" data-tr-ob-id="2144583" data-m-ob-id="2345478" role="article">\n\n\t\t<div class="product-image">\n\n\t\t\t\n\t\t\t\t\t\t\t\t\t<img src="https://s.favi.cz/static/images/t/product/300/6f/92/6f922779-bc84-483e-b1cd-ad8522ef0c92.jpg" alt="Moderní box spring postel Alvares 160x200, bílá" class="img-responsive">\n\t\t\t\t\t\t\t\n\n\t\t\t\n\t\t\t\t\t\t\t\t\t<span class="count">485</span>\n\t\t\t\t\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t</div>\n\n\t\t<div class="product-labels stickers-holder">\n\n\t\t\t\t\t\t\t<span class="sticker storage white">\n\t\t\t\t\t<span class="text">Skladem</span>\n\t\t\t\t</span>\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t</div>\n\n\t\t<h2 class="small-text product-item-title">\n\t\t\tModerní box spring postel Alvares 160x200, bílá\n\t\t</h2>\n\n\t\t<div class="product-bottom">\n\n\t\t\t<div class="pull-left product-item-price">\n\t\t\t\t<span>15 599 Kč</span>\n\t\t\t\t\t\t\t</div>\n\n\t\t\t<div class="pull-right product-shop-link">\n\t\t\t\t\n\t\t\t\t\tDetail\n\t\t\t\t\n\n\t\t\t\t\n\t\t\t\t\t<strong>Do obchodu</strong>\n\t\t\t\t\n\t\t\t</div>\n\n\t\t</div>\n\n\t\t\n\t</div>'
In [60]: elements[0].get_attribute('data-position')
Out[60]: '1'
In [61]: elements[0].get_attribute('data-pr-name')
Out[61]: 'Moderní box spring postel Alvares 160x200, bílá'
Related
Question:
Can I group found elements by a div class they're in and store them in lists in a list.
Is that possible?
*So I did some further testing and as said. It seems like that even if you store one div in a variable and when trying to search in that stored div it searches the whole site content.
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# Let's say this is the class of the different divs, I want to group it by
#class='a-fixed-right-grid a-spacing-top-medium'
# These are the texts from all divs around the page that I'm looking for but I can't say which one belongs in witch div
elements = driver.find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
for element in elements:
result_text.append(element.text)
print(result_text )
Current Result:
I'm already getting all the information I'm looking for from different divs around the page but I want it to be "grouped" by the topmost div.
['Text11', 'Text12', 'Text2', 'Text31', 'Text32']
Result I want to achieve:
The
text is grouped by the #class='a-fixed-right-grid a-spacing-top-medium'
[['Text11', 'Text12'], ['Text2'], ['Text31', 'Text32']]
HTML: (looks something like this)
class="a-text-center a-fixed-left-grid-col a-col-left" is the first one that wraps the group from there on we can use any div to group it. At least I think that.
</div>
</div>
</div>
</div>
<div class="a-fixed-right-grid a-spacing-top-medium"><div class="a-fixed-right-grid-inner a-grid-vertical-align a-grid-top">
<div class="a-fixed-right-grid-col a-col-left" style="padding-right:3.2%;float:left;">
<div class="a-row">
<div class="a-fixed-left-grid a-spacing-base"><div class="a-fixed-left-grid-inner" style="padding-left:100px">
<div class="a-text-center a-fixed-left-grid-col a-col-left" style="width:100px;margin-left:-100px;float:left;">
<div class="item-view-left-col-inner">
<a class="a-link-normal" href="/gp/product/B07YCW79/ref=ppx_yo_dt_b_asin_image_o0_s00?ie=UTF8&psc=1">
<img alt="" src="https://images-eu.ssl-images-amazon.com/images/I/41rcskoL._SY90_.jpg" aria-hidden="true" onload="if (typeof uet == 'function') { uet('cf'); uet('af'); }" class="yo-critical-feature" height="90" width="90" title="Same as the text I'm looking for" data-a-hires="https://images-eu.ssl-images-amazon.com/images/I/41rsxooL._SY180_.jpg">
</a>
</div>
</div>
<div class="a-fixed-left-grid-col a-col-right" style="padding-left:1.5%;float:left;">
<div class="a-row">
<a class="a-link-normal" href="/gp/product/B07YCR79/ref=ppx_yo_dt_b_asin_title_o00_s0?ie=UTF8&psc=1">
Text I'm looking for
</a>
</div>
<div class="a-row">
I don't have the link to test it on but this might work for you:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = [[a.text for a in div.find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")]
for div in driver.find_elements_by_class_name('a-fixed-right-grid')]
print(result_text)
EDIT: added alternative function:
# if that doesn't work try:
def get_results(selenium_driver, div_class, a_xpath):
div_list = []
for div in selenium_driver.find_elements_by_class_name(div_class):
a_list = []
for a in div.find_elements_by_xpath(a_xpath):
a_list.append(a.text)
div_list.append(a_list)
return div_list
get_results(driver,
div_class='a-fixed-right-grid'
a_xpath="//a[contains(#href, '/gp/product/')]")
If that doesn't work then maybe the xpath is returning EVERY matching element every time despite being called from the div, or another element has that same class name farther up the document
I need to scrape some pages. The exact structure of the part that I want is as follows:
<div class="someclasses">
<h3>...</h3> # Not needed
<ul class="ul-class1 ul-class2">
<li id="li1-id" class="li-class1 li-class2">
<div id ="div1-id" class="div-class1 div-class2 ... div-class6">
<div class="div2-class">
<div class="div3-class">...</div> #Not needed
<div class="div4-class1 div4-class2 div4-class3">
<a href="href1" data-control-id="id1" data-control-name="name" id ="a1-id" class="a-class1 a-class2">
<h3 class="h3-class1 h3-class2 h3-class3">Text1</h3>
</a></div>
<div>...</div> # Not needed
</div>
</li>
<li id="li2-id" class="li-class1 li-class2">
<div id ="div2-id" class="div-class1 div-class2 ... div-class6">
<div class="div2-class">
<div class="div3-class">...</div> #Not needed
<div class="div4-class1 div4-class2 div4-class3">
<a href="href2" data-control-id="id2" data-control-name="name" id ="a2-id" class="a-class1 a-class2">
<h3 class="h3-class1 h3-class2 h3-class3">Text2</h3>
</a></div>
<div>...</div> # Not needed
</div>
</li>
# More <li> elements
</ul>
</div>
Now what I want is to get the Texts as well as the hrefs.I have used the naming in above example exactly realistic i.e the same names are also the same in the real webpage. The code that I am currently using is:
elems = driver.find_elements_by_xpath("//div[#class='someclasses']/ul[#class='ul-class1']/li[#class='li-class1']")
print(len(elems))
for elem in elems:
elem1 = driver.find_element_by_xpath("./a[#data-control-name='name']")
names2.append(elem1.text)
print(elem1.text)
hrefs.append(elem.get_attribute("href"))
The result of the print statement above is 0 so basically the elements are not found. Can anyone please tell me what am I doing wrong.
You are using only part of the class name... in XPATH you need the full class name...
FYI: With CSS you can use part of the class name...
If you want to use XPATH try:
elems = driver.find_elements_by_xpath("//div[#class='someclasses']//li//a")
print(len(elems))
for elem in elems:
names2.append(elem.text)
print(elem.text)
new_href = elem.get_attribute("href")
print(new_href)
hrefs.append(new_href)
For CSS use: div.someclasses ul.ul-class1
elems = driver.find_elements_by_css_selector("div.someclasses ul.ul-class1 li a")
for elem in elems:
names2.append(elem.text)
print(elem.text)
new_href = elem.get_attribute("href")
print(new_href)
hrefs.append(new_href)
I'm using Python and Selenium to Scrape a website. Used find_by_element to find all the values that I need but I've run into something more challenging. The website html show the exactly structure to two different values and I cannot use a simple find_element_by_class because they have the same classes and ids. I don't want to use xpath or selector because I am iterating this through many "flight-row" divs and it would make thinks more hardcoded.
<div class="flight-row">
<div class="row row-eq-heights">
<div class="col-xs-4 col-md-4 no-padding"><span class="airline-name">gol</span><span class="flight-number">AM-477</span></div>
<div class="col-xs-4 col-md-4">
<div class="flight-timming"><span class="flight-time">06:15</span><span class="flight-destination">IAH</span></div><span class="flight-data">01/10/19</span></div>
<div class="col-xs-4 col-md-4 no-padding">
<div class="duration"><span class="flight-duration">21:25</span><span class="flight-stops" aria-label="Paradas do voo">2 paradas</span></div>
</div>
<div class="col-xs-4 col-md-4">
<div class="flight-timming"><span class="flight-destination">GIG</span><span class="flight-time">05:40</span></div><span class="flight-data">02/10/19</span></div>
</div>
</div>
I wanna get the values from flight-time, flight-destination and flight-data from the both "col-xs-4 col-md-4" divs.
This is a little of my code:
outbound_flights = driver.find_elements_by_css_selector("div[class^='flight-item ']")
for outbound_flight in outbound_flights:
airline = outbound_flight.find_element_by_css_selector("span[class='airline-name']")
Thank you!
Try the following css selector to get flight-time, flight-destination and flight-data
outbound_flights = driver.find_elements_by_css_selector("div.col-xs-4.col-md-4:not(.no-padding)")
for outbound_flight in outbound_flights:
flight_time = outbound_flight.find_element_by_css_selector("div.flight-timming span.flight-time").text
print(flight_time)
flight_destination = outbound_flight.find_element_by_css_selector("div.flight-timming span.flight-destination").text
print(flight_destination)
flight_data = outbound_flight.find_element_by_css_selector("span.flight-data").text
print(flight_data)
Output on console:
06:15
IAH
01/10/19
05:40
GIG
02/10/19
EDITED Answer:
outbound_flights = driver.find_elements_by_css_selector("div.col-xs-4.col-md-4:not(.no-padding)")
flighttime=[]
for outbound_flight in outbound_flights:
flight_time = outbound_flight.find_element_by_css_selector("div.flight-timming span.flight-time").text
print(flight_time)
flighttime.append(flight_time)
flight_destination = outbound_flight.find_element_by_css_selector("div.flight-timming span.flight-destination").text
print(flight_destination)
flight_data = outbound_flight.find_element_by_css_selector("span.flight-data").text
print(flight_data)
departure_time=flighttime[0]
arrival_time=flighttime[1]
print("Departure time :" + departure_time)
print("Arrival time :" + arrival_time)
You can get values by index.
(//*[#class='flight-time'])[1] and (//*[#class='flight-time'])[2]
I am using selenium2library(python) for our automation. this is the method is used
def get_appointment_from_manage(self, date, appt_id):
ref_date = "//*[#data-date=\"%s\"]" % date
time.sleep(2)
logging.info(date)
logging.info(appt_id)
while not self.is_element_present_by_xpath(ref_date) :
self._current_browser().find_element_by_xpath("//*[#id=\"calendar1\"]/div[1]/div[3]/div/button[2]").click();
time.sleep(2)
element = self._current_browser().find_element_by_xpath("//*[#data-aid=\"%s\"]" % appt_id)
logging.info(element)
ActionChains(self._current_browser()).move_to_element(element).click().perform()
The logging states that the element was found but it doesn't click.
this is the part that isn't clicking.
element = self._current_browser().find_element_by_xpath("//*[#data-aid=\"%s\"]" % appt_id)
logging.info(element)
ActionChains(self._current_browser()).move_to_element(element).click().perform()
When you inspect the element, the whole element is covered in blue. So I don't know what am i missing. Firefox version is 28. Thanks in advance!
EDIT
This is the html
<div class="fc-event-container">
<div class="fc-event-box" style="position:relative;z-index:1"></div>
<div data-aid="31" class="fc-event-data-container fc-status-2" style="position:absolute;top:0px;right:0;bottom:-62px;left:0;z-index:1">
<div class="fc-event-data-box">
<a class="fc-time-grid-event fc-event fc-start fc-end evnt-1419408000000" style="top: 0px; bottom: -62px; z-index: 1; left: 0%; right: 0%;">
<div class="fc-content">
<div class="fc-time" data-start="8:00" data-full="8:00 AM - 8:30 AM" style="display:none;">
<span>8:00 - 8:30</span>
</div>
<div class="fc-title">Robot-FN</div>
<span class="fc-product">Home Loans</span>
</div>
<div class="fc-bg"></div>
</a>
</div>
</div>
</div>
I'm not sure this what you are trying, but if you want to click on the <a> tag (which is clickable), then, you need to hold that element, not the <div> that contains it.
try somthing like this: (I didn't try this xpath so take it as a general idea)
element = self._current_browser().find_element_by_xpath("//*[#data-aid=\"%s\"]//a" % appt_id)
The website I'm trying to scrape looks like this:
<div align="center" class="movietable">
<span style="width:45px;height:47px;vertical-align:middle;display:table-cell;">
<img border="0" src="styles/images/cat/hd.png" alt="HdO">
</span>
</div>
<div align="left" class="movietable">
<span style="padding:0px 5px;width:455px;height:47px;vertical-align:middle;display:table-cell;">
<a data-toggle="tooltip" data-placement="bottom" data-html="true" title="" href="details.php?id=578197" data-original-title="<img src='https://trasd.tmdb.org//tqistSlQGQVlvDZHweD.jpg'>">
<b>GET THIS TEXT</b></a><br><font class="small">[Action, Horror, Sci-Fi]</font>
</span>
</div>
How can I extract:
The text in the <b> tag - in this case GET THIS TEXT
The content of the font_class= 'small' - in this case this would be Action, Horror, Sci-Fi
.movietable b works great!!
The img_scr link - in thiscase it would be https://trasd.tmdb.org//tqistSlQGQVlvDZHweD.jpg
I have no ideea how to do this
Below are CSS selectors you can use:
driver.find_element_by_css_selector('div[align=left] b')
driver.find_element_by_css_selector('div[align=left] .small')
driver.find_element_by_css_selector('a[title]').get_attribute('data-original-title')
You can access all of them using xpath:
1) [parents before this div]/div[2]/span/a/b
2) [parents before this div]/div[2]/span/font
3) [parents before this div]/div[1]/span/a/img
[parents before this div] should be /html/body/...
As per the HTML you have shared to extract the items you can use the following solution:
GET THIS TEXT:
driver.find_element_by_xpath("//div[#class='movietable' and #align='left']/span/a[#data-toggle='tooltip' and #data-placement='bottom']/b").get_attribute("innerHTML")
[Action, Horror, Sci-Fi]:
driver.find_element_by_xpath("//div[#class='movietable' and #align='left']/span//font[#class='small']").get_attribute("innerHTML")
https://trasd.tmdb.org//tqistSlQGQVlvDZHweD.jpg:
img_src = driver.find_element_by_xpath("//div[#class='movietable' and #align='left']/span/a[#data-toggle='tooltip' and #data-placement='bottom']").get_attribute("data-original-title")
src = img_src.replace("'", "-").split("-")
print(src[1])