Selenium: Use Multiple Strings in find_element_by_partial_link_text() - python

I want to click a link that contains either the partial strings foo OR bar in the link text.
Something like:
elem = driver.find_element_by_partial_link_text(['foo','bar']).click()
or if it was using a str.contains("foo|bar") style:
elem = driver.find_element_by_partial_link_text('foo|bar']).click()
Whats the right way to do this?
EDIT
Example HTML:
<a class="noprint" href="/Docs/Doc?request=62391270&eCode=0XrIMF9p%2BMKSvdpdpqC5Nd3VFn4fB1eLXC3X0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">foo</a>
Or one with bar
<a class="noprint" href="/DocView/Doc?request=62391270&eCode=CWJ1stkSu3qFZ1coGTEsM8ka4xqU0XrIMF9p%2BfB1eLXC3wh4xPFQnYwOqX0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">bar</a>
EDIT2 The final working code was:
elem = driver.find_element_by_xpath("//a[contains(text(), 'foo') or contains(text(), 'bar')]").click()

If you don't want to create N conditional lines to do it, you can just use or operator or using xpath.
Example: //a[contains(text(), 'aaaaa') or contains(text(), 'bbbbb')]

I can get around it in the short term by doing this but it seems like a crappy solution:
try:
elem = driver.find_element_by_partial_link_text('foo').click()
except:
elem = driver.find_element_by_partial_link_text('bar').click()
EDIT
Example HTML:
<a class="noprint" href="/Docs/Doc?request=62391270&eCode=0XrIMF9p%2BMKSvdpdpqC5Nd3VFn4fB1eLXC3X0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">foo</a>
Or one with bar
<a class="noprint" href="/DocView/Doc?request=62391270&eCode=CWJ1stkSu3qFZ1coGTEsM8ka4xqU0XrIMF9p%2BfB1eLXC3wh4xPFQnYwOqX0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">bar</a>

Related

Group in list by div class

Question:
Can I group found elements by a div class they're in and store them in lists in a list.
Is that possible?
*So I did some further testing and as said. It seems like that even if you store one div in a variable and when trying to search in that stored div it searches the whole site content.
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# Let's say this is the class of the different divs, I want to group it by
#class='a-fixed-right-grid a-spacing-top-medium'
# These are the texts from all divs around the page that I'm looking for but I can't say which one belongs in witch div
elements = driver.find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
for element in elements:
result_text.append(element.text)
print(result_text )
Current Result:
I'm already getting all the information I'm looking for from different divs around the page but I want it to be "grouped" by the topmost div.
['Text11', 'Text12', 'Text2', 'Text31', 'Text32']
Result I want to achieve:
The
text is grouped by the #class='a-fixed-right-grid a-spacing-top-medium'
[['Text11', 'Text12'], ['Text2'], ['Text31', 'Text32']]
HTML: (looks something like this)
class="a-text-center a-fixed-left-grid-col a-col-left" is the first one that wraps the group from there on we can use any div to group it. At least I think that.
</div>
</div>
</div>
</div>
<div class="a-fixed-right-grid a-spacing-top-medium"><div class="a-fixed-right-grid-inner a-grid-vertical-align a-grid-top">
<div class="a-fixed-right-grid-col a-col-left" style="padding-right:3.2%;float:left;">
<div class="a-row">
<div class="a-fixed-left-grid a-spacing-base"><div class="a-fixed-left-grid-inner" style="padding-left:100px">
<div class="a-text-center a-fixed-left-grid-col a-col-left" style="width:100px;margin-left:-100px;float:left;">
<div class="item-view-left-col-inner">
<a class="a-link-normal" href="/gp/product/B07YCW79/ref=ppx_yo_dt_b_asin_image_o0_s00?ie=UTF8&psc=1">
<img alt="" src="https://images-eu.ssl-images-amazon.com/images/I/41rcskoL._SY90_.jpg" aria-hidden="true" onload="if (typeof uet == 'function') { uet('cf'); uet('af'); }" class="yo-critical-feature" height="90" width="90" title="Same as the text I'm looking for" data-a-hires="https://images-eu.ssl-images-amazon.com/images/I/41rsxooL._SY180_.jpg">
</a>
</div>
</div>
<div class="a-fixed-left-grid-col a-col-right" style="padding-left:1.5%;float:left;">
<div class="a-row">
<a class="a-link-normal" href="/gp/product/B07YCR79/ref=ppx_yo_dt_b_asin_title_o00_s0?ie=UTF8&psc=1">
Text I'm looking for
</a>
</div>
<div class="a-row">
I don't have the link to test it on but this might work for you:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = [[a.text for a in div.find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")]
for div in driver.find_elements_by_class_name('a-fixed-right-grid')]
print(result_text)
EDIT: added alternative function:
# if that doesn't work try:
def get_results(selenium_driver, div_class, a_xpath):
div_list = []
for div in selenium_driver.find_elements_by_class_name(div_class):
a_list = []
for a in div.find_elements_by_xpath(a_xpath):
a_list.append(a.text)
div_list.append(a_list)
return div_list
get_results(driver,
div_class='a-fixed-right-grid'
a_xpath="//a[contains(#href, '/gp/product/')]")
If that doesn't work then maybe the xpath is returning EVERY matching element every time despite being called from the div, or another element has that same class name farther up the document

Selenium, Xpath, select a certain part of text within a node

I have a source file like this:
<div class="l_post j_l_post l_post_bright " ...>
<div class="lzl_cnt">
...
<span class="lzl_content_main">
text1
<a class="at j_user_card" username="...">
username
</a>
text3
</span>
</div>
...
</div>
And I want to get text3, Currently, I tried this:(I am at <div class="lzl_cnt">)
driver.find_element(By.XPATH,'.//span[#class="lzl_content_main"]/text()[1]')
but I got
"Message: invalid selector: The result of the xpath expression
".//span[#class="lzl_content_main"]/text()[1]" is: [object Text]. It
should be an element".
And Is there a way to get the "text3"?
I should make it clearer:
The above HTML is part of the bigger structure, and I selected it out with the following python code:
for sel in driver.find_elements_by_css_selector('div.l_post.j_l_post.l_post_bright'):
for i in sel.find_elements_by_xpath('.//div[#class="lzl_cnt"]'):
#user1 = i.find_element_by_xpath('.//a[#class="at j_user_card "]').text
try: user2 = i.find_element_by_xpath('.//span[#class="lzl_content_main"]/a[#username]').text
except: user2 = ""
text3 = ???
print(user2, text3)
In selenium you cannot use XPath that returns attributes or text nodes, so /text() syntax is not allowed. If you want to get specific child text node only instead of complete text content (returned by text property), you might execute JavaScript
You can apply below code to get required text node:
...
try: user2 = i.find_element_by_xpath('.//span[#class="lzl_content_main"]/a[#username]').text
except: user2 = ""
span = i.find_element_by_xpath('.//span[#class="lzl_content_main"]')
reply = driver.execute_script('return arguments[0].lastChild.textContent;', span)
You might also need to do reply = reply.strip() to get rid of trailing spaces
Yes:
//div[#class='lzl_cnt']
And then you should use the .text on that element
Except you span isn't closed, so assuming it closes before the div.
Here i am answering a solution for you.
List<WebElement> list = driver.findElements(By.tagName("span"));
for(WebElement el : list){
String desiredText = el.getAttribute("innerHTML");
if(desiredText.equalsIgnoreCase("text3")){
System.out.println("desired text found");
break;
}
}
Please use the above code and let me know your feedback.

How to find text with a particular value BeautifulSoup python2.7

I have the following html: I'm trying to get the following numbers saved as variables Available Now,7,148.49,HatchBack,Good. The problem I'm running into is that I'm not able to pull them out independently since they don't have a class attached to it. I'm wondering how to solve this. The following is the html then my futile code to solve this.
</div>
<div class="car-profile-info">
<div class="col-md-12 no-padding">
<div class="col-md-6 no-padding">
<strong>Status:</strong> <span class="statusAvail"> Available Now </span><br/>
<strong>Min. Booking </strong>7 Days ($148.89)<br/>
<strong>Style: </strong>Hatchback<br/>
<strong>Transmission: </strong>Automatic<br/>
<strong>Condition: </strong>Good<br/>
</div>
Python 2.7 Code: - this gives me the entire html!
soup=BeautifulSoup(html)
print soup.find("span",{"class":"statusAvail"}).getText()
for i in soup.select("strong"):
if i.getText()=="Min. Booking ":
print i.parent.getText().replace("Min. Booking ","")
Find all the strong elements under the div element with class="car-profile-info" and, for each element found, get the .next_siblings until you meet the br element:
from bs4 import BeautifulSoup, Tag
for strong in soup.select(".car-profile-info strong"):
label = strong.get_text()
value = ""
for elm in strong.next_siblings:
if getattr(elm, "name") == "br":
break
if isinstance(elm, Tag):
value += elm.get_text(strip=True)
else:
value += elm.strip()
print(label, value)
You can use ".next_sibling" to navigate to the text you want like this:
for i in soup.select("strong"):
if i.get_text(strip=True) == "Min. Booking":
print(i.next_sibling) #this will print: 7 Days ($148.89)
See also http://www.crummy.com/software/BeautifulSoup/bs4/doc/#going-sideways

Looping over multiple tooltips

I am trying to get names and affiliations of authors from a series of articles from this page (you'll need to have access to Proquest to visualise it). What I want to do is to open all the tooltips present at the top of the page, and extract some HTML text from them. This is my code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
browser = webdriver.Firefox()
url = 'http://search.proquest.com/econlit/docview/56607849/citation/2876523144F544E0PQ/3?accountid=13042'
browser.get(url)
#insert your username and password here
n_authors = browser.find_elements_by_class_name('zoom') #zoom is the class name of the three tooltips that I want to open in my loop
author = []
institution = []
for a in n_authors:
print(a)
ActionChains(browser).move_to_element(a).click().perform()
html_author = browser.find_element_by_xpath('//*[#id="authorResolveLinks"]/li/div/a').get_attribute('innerHTML')
html_institution = browser.find_element_by_xpath('//*[#id="authorResolveLinks"]/li/div/p').get_attribute('innerHTML')
author.append(html_author)
institution.append(html_institution)
Although n_authors has three entries that are apparently different from one another, selenium fails to get the info from all tooltips, instead returning this:
author
#['Nuttall, William J.',
#'Nuttall, William J.',
#'Nuttall, William J.']
And the same happens for the institution. What am I getting wrong? Thanks a lot
EDIT:
The array containing the xpaths of the tooltips:
n_authors
#[<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{008a2ade-fc82-4114-b1bf-cc014d41c40f}")>,
#<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{c4c2d89f-3b8a-42cc-8570-735a4bd56c07}")>,
#<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{9d06cb60-df58-4f90-ad6a-43afeed49a87}")>]
Which has length 3, and the three elements are different, which is why I don't understand why selenium won't distinguish them.
EDIT 2:
Here is the relevant HTML
<span class="titleAuthorETC small">
<span style="display:none" class="title">false</span>
Jamasb, Tooraj
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_0" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a><script type="text/javascript">Tips.images = '/assets/r20161.1.0-4/pqc/javascript/prototip/images/prototip/';</script>; Nuttall, William J
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_1" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a>; Pollitt, Michael G
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_2" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a>.
UPDATE:
#parishodak's answer, for some reason does not work using Firefox, unless I manually hover over the tooltips first. It works with chromedriver, but only if I first hover over the tooltips, and only if I allow time.sleep(), as in
for i in itertools.count():
try:
tooltip = browser.find_element_by_xpath('//*[#id="resolverCitation_previewTrigger_' + str(i) + '"]')
print(tooltip)
ActionChains(browser).move_to_element(tooltip).perform() #
except NoSuchElementException:
break
time.sleep(2)
elements = browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a')
author = []
for e in elements:
print(e)
attribute = e.get_attribute('innerHTML')
author.append(attribute)`
The reason it is returning the same element, because xpath is not changing for all the loop iterations.
Two ways to deal:
Use array notation for xpath as described below:
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[1]').get_attribute('innerHTML')
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[2]').get_attribute('innerHTML')
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[3]').get_attribute('innerHTML')
Or
Instead of find_element_by_xpath use find_elements_by_xpath
elements = browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a')
loop over elements and use get_attribute('innerHTML') on each element in loop iteration.

select xpath of data in a tag using lxml

I am trying to select the "(6)" in the tag below:
<a class="itemRating" href="http://www.newegg.com/Product/ProductReview.aspx?Item=N82E16834200347" title="Rating + 4">
<span class="eggs r4"> </span>
(6)
</a>
The xpath, which I will call review, is in the () below:
review = site.xpath('/html/body/div[3]/div[2]/table/tr/td[2]/div/div[8]/div/div/div/a[3]
When I try printing review[0].text, it prints 'None' instead of the (6).
Any ideas?
(6) is in the tail of <span> element:
>>> a[0].tail
'\n(6)\n'
You can use:
review[0].text_content().strip()
or
review[0].xpath('string()').strip()
And I'd write your xpath as:
review = site.xpath('//a[#class="itemRating"]')

Categories