Looping over multiple tooltips - python

I am trying to get names and affiliations of authors from a series of articles from this page (you'll need to have access to Proquest to visualise it). What I want to do is to open all the tooltips present at the top of the page, and extract some HTML text from them. This is my code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
browser = webdriver.Firefox()
url = 'http://search.proquest.com/econlit/docview/56607849/citation/2876523144F544E0PQ/3?accountid=13042'
browser.get(url)
#insert your username and password here
n_authors = browser.find_elements_by_class_name('zoom') #zoom is the class name of the three tooltips that I want to open in my loop
author = []
institution = []
for a in n_authors:
print(a)
ActionChains(browser).move_to_element(a).click().perform()
html_author = browser.find_element_by_xpath('//*[#id="authorResolveLinks"]/li/div/a').get_attribute('innerHTML')
html_institution = browser.find_element_by_xpath('//*[#id="authorResolveLinks"]/li/div/p').get_attribute('innerHTML')
author.append(html_author)
institution.append(html_institution)
Although n_authors has three entries that are apparently different from one another, selenium fails to get the info from all tooltips, instead returning this:
author
#['Nuttall, William J.',
#'Nuttall, William J.',
#'Nuttall, William J.']
And the same happens for the institution. What am I getting wrong? Thanks a lot
EDIT:
The array containing the xpaths of the tooltips:
n_authors
#[<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{008a2ade-fc82-4114-b1bf-cc014d41c40f}")>,
#<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{c4c2d89f-3b8a-42cc-8570-735a4bd56c07}")>,
#<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{9d06cb60-df58-4f90-ad6a-43afeed49a87}")>]
Which has length 3, and the three elements are different, which is why I don't understand why selenium won't distinguish them.
EDIT 2:
Here is the relevant HTML
<span class="titleAuthorETC small">
<span style="display:none" class="title">false</span>
Jamasb, Tooraj
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_0" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a><script type="text/javascript">Tips.images = '/assets/r20161.1.0-4/pqc/javascript/prototip/images/prototip/';</script>; Nuttall, William J
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_1" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a>; Pollitt, Michael G
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_2" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a>.
UPDATE:
#parishodak's answer, for some reason does not work using Firefox, unless I manually hover over the tooltips first. It works with chromedriver, but only if I first hover over the tooltips, and only if I allow time.sleep(), as in
for i in itertools.count():
try:
tooltip = browser.find_element_by_xpath('//*[#id="resolverCitation_previewTrigger_' + str(i) + '"]')
print(tooltip)
ActionChains(browser).move_to_element(tooltip).perform() #
except NoSuchElementException:
break
time.sleep(2)
elements = browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a')
author = []
for e in elements:
print(e)
attribute = e.get_attribute('innerHTML')
author.append(attribute)`

The reason it is returning the same element, because xpath is not changing for all the loop iterations.
Two ways to deal:
Use array notation for xpath as described below:
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[1]').get_attribute('innerHTML')
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[2]').get_attribute('innerHTML')
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[3]').get_attribute('innerHTML')
Or
Instead of find_element_by_xpath use find_elements_by_xpath
elements = browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a')
loop over elements and use get_attribute('innerHTML') on each element in loop iteration.

Related

Selenium python : my current_url doesnt update after click

Im scraping a website where I need to retrieve values from the url when i click on a button providing different form values.
I have a problem: when i click the button and retrieve the current_url, the provided values in the forms doesnt reflect in the url which should be updated (it's a search button). There is no new tab created.
My code to retrieve the url value is :
driver = webdriver.Firefox()
driver.get(url)
arrlist = []
idlist = []
service=value
for i in key_list:
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
form[0].send_keys(i)
form2=driver.find_elements(by=By.XPATH, value='//input[#id="sev_nav"]')
form2[0].send_keys(service)
button=driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]')
button[0].click()
time.sleep(5)
url=driver.current_url
print(dept)
print(i)
id=re.findall(r"(?<=\[population\]=)(\d{9})",url)[0]
arrlist.append(i)
idlist.append(id)
the button html code is :
<button class="filter-apply cta-navigate relative hide-mobile flex withNumber" data-role="filter-apply">
<p class="hide-mobile m-r-4">Appliquer</p>
<div class="svg relative">
<span class="filters-apply-length">2</span>
<svg height="18" viewBox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg"><path d="m10.877 17.457 2.026 1.533v-4.553c0-.166.042-.329.12-.475l4.3-7.962h-10.68l4.122 7.978c.074.142.112.3.112.459zm3.026 4.543c-.213 0-.426-.068-.603-.203l-4.026-3.045c-.25-.189-.397-.484-.397-.797v-3.274l-4.765-9.222c-.161-.31-.148-.681.034-.979.181-.298.505-.48.854-.48h14c.352 0 .678.185.859.488.18.302.188.677.021.987l-4.977 9.215v6.31c0 .379-.214.726-.554.895-.141.07-.294.105-.446.105z" fill="#0579c7" fill-rule="evenodd" transform="translate(-4 -4)"></path></svg> </div>
</button>
I've tried to use
driver.switch_to.window(driver.window_handles[-1]);
following this post : Python Selenium Chromedriver - Can't Get current_url of new opened tab after click()
But I dont have tab or new windows issues.
I tried to click autocompletion lists in the 2 forms in inputand one of the form produces a modification of the url but not the other (the one of which effects on the url i need to monitor).
The form code that works is :
<form data-component="sev_nav_input" data-no-results="Sans résultats" data-default-pho="Services" data-selected-name="Achat compulsif" data-selected-id="5928" class="filter-input filter-services relative">
<input type="text" placeholder="Services" autocomplete="off" name="sev_nav" id="sev_nav" data-role="js_filter" data-id="5928" class="autocomplete-with-result">
<span id="clear-sev-input" class="clear-sev-input" style="display: none;">
<img src="data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjI0IiB2aWV3Qm94PSIwIDAgMjQgMjQiIHdpZHRoPSIyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Im0wIDBoMjR2MjRoLTI0eiIvPjxwYXRoIGQ9Im0xMS43NSAyMC41YzIuNDIyIDAgNC40ODYtLjg1MyA2LjE5MS0yLjU1OSAxLjcwNi0xLjcwNSAyLjU1OS0zLjc3IDIuNTU5LTYuMTkxIDAtMi40MjItLjg1My00LjQ4Ni0yLjU1OS02LjE5MS0xLjcwNS0xLjcwNi0zLjc3LTIuNTU5LTYuMTkxLTIuNTU5LTIuNDIyIDAtNC40OTIuODYtNi4yMSAyLjU3OC0xLjY5NSAxLjY5My0yLjU0IDMuNzUtMi41NCA2LjE3MnMuODUzIDQuNDg2IDIuNTU5IDYuMTkxYzEuNzA1IDEuNzA2IDMuNzcgMi41NTkgNi4xOTEgMi41NTl6bTMuMTY0LTQuNDE0Yy0uMDUyIDAtLjExNy0uMDQtLjE5NS0uMTE3bC0yLjk2OS0yLjkzLTIuOTMgMi45NjljLS4wNTIuMDUyLS4xMy4wNzgtLjIzNC4wNzhzLS4xODItLjAyNi0uMjM0LS4wNzhsLS44Mi0uODZjLS4wNTMtLjA1Mi0uMDc5LS4xMy0uMDc5LS4yMzRzLjAyNi0uMTcuMDc4LS4xOTVsMi45NjktMi45NjktMi45NjktMi45M2MtLjE1Ni0uMTU2LS4xNTYtLjMxMiAwLS40NjhsLjgyLS44MmMuMDc5LS4wNzkuMTU3LS4xMTguMjM1LS4xMTguMDUyIDAgLjExNy4wNC4xOTUuMTE3bDIuOTY5IDIuODkgMi45NjktMi44OWMuMDc4LS4wNzguMTQzLS4xMTcuMTk1LS4xMTcuMDc4IDAgLjE1Ni4wNC4yMzQuMTE3bC44Ni44MmMuMTU2LjE1Ny4xNTYuMzEzIDAgLjQ3bC0yLjk2OSAyLjkyOSAyLjkzIDIuOTNjLjA3OC4wNzguMTE3LjE1Ni4xMTcuMjM0IDAgLjEwNC0uMDQuMTgyLS4xMTcuMjM0bC0uODIuODJjLS4wNzkuMDc5LS4xNTcuMTE4LS4yMzUuMTE4eiIgZmlsbD0iIzE0OWM5NyIvPjwvZz48L3N2Zz4K"></span>
<span class="gradient"></span>
<span class="gradient" style="display: none;"></span>
<div class="spinner" style="display: none;"></div>
<div id="services-list" class="services-list" style="display: none;"><ul data-role="autocomplete-list" class="autocomplete-list"> </ul></div></form>
The form code that doesnt work is :
<form data-component="geo_nav_input" data-selected-name="" data-selected-id="" data-selected-neighborhood-id="0" data-selected-type="" data-no-results="Sans résultats" data-pho="Localité" data-default-pho="Localité" class="filter-input relative">
<div class="hide">Chercher des professionnels en/à...</div>
<span class="icon-x toggle_geo_nav hide"></span>
<label for="geo_nav" class="hidden-label">Localité</label>
<input type="text" placeholder="Localité" autocomplete="off" name="geo_nav" id="geo_nav" data-role="js_filter" data-id="" data-neighborhoodid="0" data-type="" class="autocomplete-with-result"> <span id="clear-geo-input" class="clear-geo-input" style="display: none;">
<img src="data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjI0IiB2aWV3Qm94PSIwIDAgMjQgMjQiIHdpZHRoPSIyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Im0wIDBoMjR2MjRoLTI0eiIvPjxwYXRoIGQ9Im0xMS43NSAyMC41YzIuNDIyIDAgNC40ODYtLjg1MyA2LjE5MS0yLjU1OSAxLjcwNi0xLjcwNSAyLjU1OS0zLjc3IDIuNTU5LTYuMTkxIDAtMi40MjItLjg1My00LjQ4Ni0yLjU1OS02LjE5MS0xLjcwNS0xLjcwNi0zLjc3LTIuNTU5LTYuMTkxLTIuNTU5LTIuNDIyIDAtNC40OTIuODYtNi4yMSAyLjU3OC0xLjY5NSAxLjY5My0yLjU0IDMuNzUtMi41NCA2LjE3MnMuODUzIDQuNDg2IDIuNTU5IDYuMTkxYzEuNzA1IDEuNzA2IDMuNzcgMi41NTkgNi4xOTEgMi41NTl6bTMuMTY0LTQuNDE0Yy0uMDUyIDAtLjExNy0uMDQtLjE5NS0uMTE3bC0yLjk2OS0yLjkzLTIuOTMgMi45NjljLS4wNTIuMDUyLS4xMy4wNzgtLjIzNC4wNzhzLS4xODItLjAyNi0uMjM0LS4wNzhsLS44Mi0uODZjLS4wNTMtLjA1Mi0uMDc5LS4xMy0uMDc5LS4yMzRzLjAyNi0uMTcuMDc4LS4xOTVsMi45NjktMi45NjktMi45NjktMi45M2MtLjE1Ni0uMTU2LS4xNTYtLjMxMiAwLS40NjhsLjgyLS44MmMuMDc5LS4wNzkuMTU3LS4xMTguMjM1LS4xMTguMDUyIDAgLjExNy4wNC4xOTUuMTE3bDIuOTY5IDIuODkgMi45NjktMi44OWMuMDc4LS4wNzguMTQzLS4xMTcuMTk1LS4xMTcuMDc4IDAgLjE1Ni4wNC4yMzQuMTE3bC44Ni44MmMuMTU2LjE1Ny4xNTYuMzEzIDAgLjQ3bC0yLjk2OSAyLjkyOSAyLjkzIDIuOTNjLjA3OC4wNzguMTE3LjE1Ni4xMTcuMjM0IDAgLjEwNC0uMDQuMTgyLS4xMTcuMjM0bC0uODIuODJjLS4wNzkuMDc5LS4xNTcuMTE4LS4yMzUuMTE4eiIgZmlsbD0iIzE0OWM5NyIvPjwvZz48L3N2Zz4K"></span>
<span class="gradient"></span>
<span class="gradient" style="display: none;"></span>
<div class="spinner" style="display: none;"></div>
<div id="location-list" class="location-list" style="display: none;"><ul data-role="autocomplete-list" class="autocomplete-list"> </ul></div>
</form>
Can you make a function to navigate pages, and on each page do the actions you require. And with each call of the function use driver.switch_to.window to ensure you are on the latest page.
Although based on your edits, it now seems the issue is that you are having trouble locating and following one of the links on the pages.
def navigate(n):
""" Move through the pages. Select the relevant buttons on each page"""
window_after = driver.window_handles[0]
driver.switch_to.window(window_after)
if n == 0:
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
button = driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]').click()
elif n == 1:
pass
# Do something
else:
pass
# Do something else
for i in range(3):
navigate(i)
time.sleep(3)
The solution was in fact linked to the autocompletion forms. They require you to click on the autocompletion suggestions so the button is actually working.
FYI, here is the full code to autocomplete with clicking the form, deleting the content, adding the content, clicking the list and clicking the button.
def get_city_locations(service):
url='url'
#options = Options()
#options.headless = True
driver = webdriver.Firefox()#options=options)
driver.get(url)
time.sleep(2)
buttoncookie = driver.find_elements(by=By.XPATH, value='//button[#class="cf2Lf6"]')
buttoncookie[0].click()
time.sleep(1)
form2 = driver.find_elements(by=By.XPATH, value='//input[#id="sev_nav"]')
form2[0].click()
time.sleep(1)
Static.clear_text(driver)
form2[0].send_keys(service)
time.sleep(1)
autocompleteservice = driver.find_elements(by=By.XPATH, value='//li[not(#class)]')
for f in autocompleteservice:
if f.text == service:
f.click()
df_pref=pd.read_csv('arrondissement_2022.csv',sep=',')
deptlist = []
arrlist = []
idlist = []
for i in df_pref['LIBELLE']:
df_dep=df_pref[df_pref['LIBELLE']==i]
dept = df_dep.loc[df_dep.index.values[0], 'DEP']
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
form[0].click()
time.sleep(1)
Static.clear_text(driver)
form[0].send_keys(i)
time.sleep(3)
autocompletelocation=driver.find_elements(by=By.XPATH, value='//li[not(#class)]')
cond=0
for a in autocompletelocation:
if a.text==i:
print ('condition ok')
cond=1
a.click()
break
time.sleep(3)
button=driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]')
button[0].click()
time.sleep(3)
driver.switch_to.window(driver.window_handles[-1]);
url=driver.current_url
print(dept)
print(i)
print(url)
if cond==0:
id=0
else:
id=re.findall(r"(?<=\[population\]=)(\d{7,10})",url)[0]
print(f'id = {id}')
print('\n')
deptlist.append(dept)
arrlist.append(i)
idlist.append(id)
df0 = pd.DataFrame({"dept": deptlist, "arrondissement":arrlist,"id":idlist})
df0.to_csv('arr_id.csv',sep=';',index=False)

Python/Selenium- Class element cannot be clicked

I am trying to scrape data from this page: https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2
Here I am trying to expand all of the "compare odds" fields, which are contained in this HTML:
<div class="table-container">
<div class="table-header-light even"><strong>Over/Under +1 </strong><span class="avg chunk-odd-payout">93.4%</span><span class="avg chunk-odd nowrp">5.63</span>
<span
class="avg chunk-odd nowrp">1.12</span><span class="odds-cnt">(3)</span><span class="odds-co"><a class="more" href="" onclick="page.togleTableContent('P-1.00-0-0',this);return false;">Compare odds</a></span></div>
</div>
<div class="table-container" style="display: none;">
<div class="table-header-light"><strong>Over/Under +1.25 </strong><span class="avg chunk-odd-payout"></span><span class="avg chunk-odd nowrp"></span><span class="avg chunk-odd nowrp"></span>
<span
class="odds-cnt">(0)</span><span class="odds-co"><a class="more" href="" onclick="page.togleTableContent('P-1.25-0-0',this);return false;">Compare odds</a></span></div>
</div>
The part I am trying to access is the following:
span class="odds-co">Compare odds
I have tried all of the following:
#odds_rows = browser.find_elements_by_class_name('more')
# odds_rows=browser.find_elements_by_css_selector(".more")
# odds_rows=WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#class='more']")))
odds_rows= WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".more")))
#odds_rows=WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "more")))
In order to subsequently loop click through the identified fields:
for i in odds_rows:
#browser.execute_script("arguments[0].click();", i)
i.click()
However already in the step of identifying the fields I am getting a timeout error on all WebDriverWait attempts except for
odds_rows=WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "more")))
This option yields only one result:
[<selenium.webdriver.remote.webelement.WebElement (session="7cbc57173a57aadbc115264dff8ca620", element="3654f928-bca4-4033-9566-da9e6aa6294b")>]
However this result is not clicked subsequently.
What am I doing wrong?
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get("https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;0.50;0")
driver.maximize_window()
odds_rows = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.table-header-light')))
for i in odds_rows:
count = i.find_element_by_xpath('./span[#class="odds-cnt"]')
elem = i.find_elements_by_xpath('.//*[contains(text(),"Compare")]')
txt = count.text
if txt != '' and len(elem):
elem = elem[0]
driver.execute_script("arguments[0].scrollIntoView();", elem)
elem.click()
The issue is the row with count '' is not visible and you cannot click it .
if you click on 'Compare odds' you can see the URL that changes from
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;0.50;0
if you follow clicking you will se that the last part:
2;0.50;0 will increase by 0.5
next is
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;1.00;0
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;1.50;0
and continue...
In another way you have this class: "table-main detail-odds sortable" by default is hidden, because there is the data, you DON'T need to click. you can scrape that class
i hope be helpful for you.

How to iterate over children webelements in Python Webbot/Selenium?

I have a table of search results in Selenium browser and each search result is defined in Html like this:
<div class="item
itemWrapper
ItemPosition1
ItemMonitor
" data-position="1" data-it-name="NAME OF THE ITEM" data-it-category="Category" role="article">
<div class="item-image">
<a href="/some/link/" target="_blank" rel="noopener" class="itemRec">
<img src="https://some.jpg" alt="some name" class="img-responsive">
</a>
</div>
<h2 class="small-text item-title">
Link Text
</h2>
<div class="item-bottom">
<div class="pull-left item-price">
<span>999</span>
</div>
<div class="pull-right detail-link">
<a href="/link/to/detail" title="link title" class="detail"
Detail
</a>
</div>
</div>
</div>
I am able to find all webelements by classname = item.
elements = driver.find_elements_by_class_name("item")
I would need to iterate over elements and get their position, name and price to be able to click to one of them:
for e in elements:
position=e.get_attribute("data-position").value,
name=e.get_attribute("data-it-name").value,
price=e.find_element(By.CLASS_NAME,'item-price').value
but this does not work - get_attribute returns None and find_element does not find any child element
Can you please advise me how to get the "data-" atributes and child elements values correctly?
Whole code using Webbot:
import webbot
from selenium.webdriver.common.by import By
web = webbot.Browser()
web.go_to('www.***.cz')
web.type('bed', classname='header-search-form')
web.press(web.Key.ENTER)
elements = web.find_elements(classname="product-item")
for e in elements:
name = e.get_attribute("data-it-name").value
price = e.find_element(By.CLASS_NAME, 'item-price').value
print(name,price)
break
classname acts weirdly in webbot. You definitely are not getting a product item there:
In [56]: elements[0].get_attribute('outerHTML')
Out[56]: '\n\n\t\t\t\t\t\t<img src="https://s.favi.cz/static/frontend/_global/images/favi-logo/favi-logo.60d511aff13247dd52f15acf6bdf2af9.svg" role="banner">\n\n\t\t\t\t\t'
Works well with a CSS selector:
In [58]: elements = web.find_elements(css_selector=".product-item")
In [59]: elements[0].get_attribute('outerHTML')
Out[59]: '<div class="\n\t\t\tproduct-item\n\t\t\titemWrapper\n\t\t\tproductItemPosition1\n\t\t\tproductItemMonitor\n\t\t\tproductItemWrapper\n\t\t\tsendProductTransactionWrapper\n\t\t\t\t\t" data-position="1" data-pr-name="Moderní box spring postel Alvares 160x200, bílá" data-tr-id="04d62b60-9d00-4d1b-b03c-2258c50bfdb9" data-pr-category="Postele" data-tr-ob-id="2144583" data-m-ob-id="2345478" role="article">\n\n\t\t<div class="product-image">\n\n\t\t\t\n\t\t\t\t\t\t\t\t\t<img src="https://s.favi.cz/static/images/t/product/300/6f/92/6f922779-bc84-483e-b1cd-ad8522ef0c92.jpg" alt="Moderní box spring postel Alvares 160x200, bílá" class="img-responsive">\n\t\t\t\t\t\t\t\n\n\t\t\t\n\t\t\t\t\t\t\t\t\t<span class="count">485</span>\n\t\t\t\t\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t</div>\n\n\t\t<div class="product-labels stickers-holder">\n\n\t\t\t\t\t\t\t<span class="sticker storage white">\n\t\t\t\t\t<span class="text">Skladem</span>\n\t\t\t\t</span>\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t</div>\n\n\t\t<h2 class="small-text product-item-title">\n\t\t\tModerní box spring postel Alvares 160x200, bílá\n\t\t</h2>\n\n\t\t<div class="product-bottom">\n\n\t\t\t<div class="pull-left product-item-price">\n\t\t\t\t<span>15 599 Kč</span>\n\t\t\t\t\t\t\t</div>\n\n\t\t\t<div class="pull-right product-shop-link">\n\t\t\t\t\n\t\t\t\t\tDetail\n\t\t\t\t\n\n\t\t\t\t\n\t\t\t\t\t<strong>Do obchodu</strong>\n\t\t\t\t\n\t\t\t</div>\n\n\t\t</div>\n\n\t\t\n\t</div>'
In [60]: elements[0].get_attribute('data-position')
Out[60]: '1'
In [61]: elements[0].get_attribute('data-pr-name')
Out[61]: 'Moderní box spring postel Alvares 160x200, bílá'

Group in list by div class

Question:
Can I group found elements by a div class they're in and store them in lists in a list.
Is that possible?
*So I did some further testing and as said. It seems like that even if you store one div in a variable and when trying to search in that stored div it searches the whole site content.
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# Let's say this is the class of the different divs, I want to group it by
#class='a-fixed-right-grid a-spacing-top-medium'
# These are the texts from all divs around the page that I'm looking for but I can't say which one belongs in witch div
elements = driver.find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
for element in elements:
result_text.append(element.text)
print(result_text )
Current Result:
I'm already getting all the information I'm looking for from different divs around the page but I want it to be "grouped" by the topmost div.
['Text11', 'Text12', 'Text2', 'Text31', 'Text32']
Result I want to achieve:
The
text is grouped by the #class='a-fixed-right-grid a-spacing-top-medium'
[['Text11', 'Text12'], ['Text2'], ['Text31', 'Text32']]
HTML: (looks something like this)
class="a-text-center a-fixed-left-grid-col a-col-left" is the first one that wraps the group from there on we can use any div to group it. At least I think that.
</div>
</div>
</div>
</div>
<div class="a-fixed-right-grid a-spacing-top-medium"><div class="a-fixed-right-grid-inner a-grid-vertical-align a-grid-top">
<div class="a-fixed-right-grid-col a-col-left" style="padding-right:3.2%;float:left;">
<div class="a-row">
<div class="a-fixed-left-grid a-spacing-base"><div class="a-fixed-left-grid-inner" style="padding-left:100px">
<div class="a-text-center a-fixed-left-grid-col a-col-left" style="width:100px;margin-left:-100px;float:left;">
<div class="item-view-left-col-inner">
<a class="a-link-normal" href="/gp/product/B07YCW79/ref=ppx_yo_dt_b_asin_image_o0_s00?ie=UTF8&psc=1">
<img alt="" src="https://images-eu.ssl-images-amazon.com/images/I/41rcskoL._SY90_.jpg" aria-hidden="true" onload="if (typeof uet == 'function') { uet('cf'); uet('af'); }" class="yo-critical-feature" height="90" width="90" title="Same as the text I'm looking for" data-a-hires="https://images-eu.ssl-images-amazon.com/images/I/41rsxooL._SY180_.jpg">
</a>
</div>
</div>
<div class="a-fixed-left-grid-col a-col-right" style="padding-left:1.5%;float:left;">
<div class="a-row">
<a class="a-link-normal" href="/gp/product/B07YCR79/ref=ppx_yo_dt_b_asin_title_o00_s0?ie=UTF8&psc=1">
Text I'm looking for
</a>
</div>
<div class="a-row">
I don't have the link to test it on but this might work for you:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = [[a.text for a in div.find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")]
for div in driver.find_elements_by_class_name('a-fixed-right-grid')]
print(result_text)
EDIT: added alternative function:
# if that doesn't work try:
def get_results(selenium_driver, div_class, a_xpath):
div_list = []
for div in selenium_driver.find_elements_by_class_name(div_class):
a_list = []
for a in div.find_elements_by_xpath(a_xpath):
a_list.append(a.text)
div_list.append(a_list)
return div_list
get_results(driver,
div_class='a-fixed-right-grid'
a_xpath="//a[contains(#href, '/gp/product/')]")
If that doesn't work then maybe the xpath is returning EVERY matching element every time despite being called from the div, or another element has that same class name farther up the document

Selenium: Use Multiple Strings in find_element_by_partial_link_text()

I want to click a link that contains either the partial strings foo OR bar in the link text.
Something like:
elem = driver.find_element_by_partial_link_text(['foo','bar']).click()
or if it was using a str.contains("foo|bar") style:
elem = driver.find_element_by_partial_link_text('foo|bar']).click()
Whats the right way to do this?
EDIT
Example HTML:
<a class="noprint" href="/Docs/Doc?request=62391270&eCode=0XrIMF9p%2BMKSvdpdpqC5Nd3VFn4fB1eLXC3X0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">foo</a>
Or one with bar
<a class="noprint" href="/DocView/Doc?request=62391270&eCode=CWJ1stkSu3qFZ1coGTEsM8ka4xqU0XrIMF9p%2BfB1eLXC3wh4xPFQnYwOqX0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">bar</a>
EDIT2 The final working code was:
elem = driver.find_element_by_xpath("//a[contains(text(), 'foo') or contains(text(), 'bar')]").click()
If you don't want to create N conditional lines to do it, you can just use or operator or using xpath.
Example: //a[contains(text(), 'aaaaa') or contains(text(), 'bbbbb')]
I can get around it in the short term by doing this but it seems like a crappy solution:
try:
elem = driver.find_element_by_partial_link_text('foo').click()
except:
elem = driver.find_element_by_partial_link_text('bar').click()
EDIT
Example HTML:
<a class="noprint" href="/Docs/Doc?request=62391270&eCode=0XrIMF9p%2BMKSvdpdpqC5Nd3VFn4fB1eLXC3X0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">foo</a>
Or one with bar
<a class="noprint" href="/DocView/Doc?request=62391270&eCode=CWJ1stkSu3qFZ1coGTEsM8ka4xqU0XrIMF9p%2BfB1eLXC3wh4xPFQnYwOqX0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">bar</a>

Categories