Xpath - Search within xpath results - python

I am using python with xpath and am getting lost in the xpath syntax. What I would like to do is to check if there is not a tag in a table in an html page. So I am using xpath to do this. Then, if there isn't this tag, do an xpath search relative to the section. I seem to be getting something working, but it kind of does the reverse and can't figure out why. Example code is below.
main_sections = tree.xpath('//td[#class="cars"]')
for i in range(0, len(main_sections)):
has_no_flag = True
for c in main_sections[i].getchildren():
if c.tag == "span" and c.get("class") == "colorRed":
has_no_flag = False
if has_no_flag:
price = main_sections[i].xpath('//td[#class="cars"]/following-sibling::td[#class="price"]/span[#class="amount-value"]')
price_str = price[0].text.strip()
I don't think the xpath is correct for price. Hopefully someone will be able to enlighten me :)

I don't think you're using XPath correctly here.
Just filter the nodes you want to have and throw out your own loops and flags.
cars_without_tag_price = '''//td[
#class="cars" and not(span[#class="colorRed"])
]/following-sibling::td[#class="price"]/span[#class="amount-value"]
'''
for price_node in tree.xpath(cars_without_tag_price):
price_str = price_node.text.strip()

Related

Getting href with Selenium and Python

I am trying to get the href with selenium and python.
This is my page:
Some class information are changing depending on which elements. So I am trying basically to get all href for <a id="job____ .....
links.append(job.find_element_by_xpath('//a[#aria-live="polite"]//span').get_attribute(name="href"))
I tried couple of things but can't figure out how. How can i get all my href from the screenshot above?
Try this, but take care your xpath
"//a[#aria-live="polite"]//span"
will get a span, and i dont see any span with href on your html. Maybe this xpath solve it
//a[./span[#aria-live="polite"]]
links.append(job.find_element_by_xpath('//a[./span[#aria-live="polite"]]').get_attribute("href"))
But it wont get all urls, this with find_elements (return a list), extend your url list with list comprehension
links.extend([x.get_attribute("href") for x in job.find_elements_by_xpath('//a[./span[#aria-live="polite"]]')])
edit 1, other xpath solution
links.extend(["website_base_url"+x.get_attribute("href") for x in job.find_elements_by_xpath('//a[contains(#id, "job_")]')])
list_of_elements_with_href = wd.find_elements_by_xpath("//a[contains(#href,'')]")
for el_with_href in list_of_elements_with_href :
links.append(el.with_href.get_attribute("href"))
or if you need more specify:
list_of_elements_with_href = wd.find_elements_by_xpath("//a[contains(#href,'') and contains(#id,'job_')]")
Based on your description and attached image, I think you have got the wrong xpath. Try the following code.
find_links = driver.find_elements_by_xpath("//a[starts-with(#id,'job_')]")
links = []
for link in find_links:
links.append(link.get_attribute("href"))
Please note elements in find_elements_by_xpath instead of element.
I am unable to test this solution as you have not provided the website.

XPATH syntax for element in {} with Selenium for Python

I am currently trying to click each button on a webpage with Selenium in Python, the class and text is always the same for each button but each button has different ids. The ids, however, are within "data-paramaters" in {} and I can't figure out how to get the correct syntax for the xpath.
Here is a snippet of the website for one of the buttons:
<span class="contains-icon-details gc-btn gc-btn--s" data-isneededpromise="false" data-parameters="{"partner":"gs", "realId": "8da1d6a9-44d1-4556-bc12-92699749a30a", "tnId": "102086182829", "type": "details"}">More Details</span>
It seems the realId and the tnId are unique, so I would need to find the buttons with either one of those.
This works:
driver.find_element_by_xpath("//span[#class='contains-icon-details gc-btn gc-btn--s']").click()
but of course only for the first button as the class is always the same.
I tried something like this:
driver.find_element_by_xpath("//*[contains(#tnId, '102086182829')]").click()
but I get
Unable to locate element: //*[contains(#tnId, '102086182829')]
so definitely not the correct syntax.
I tried to find a solution online, but with no luck so far. Can anybody point me into the right direction? Thanks in advance.
In case realId value or tnId value is unique your XPath can be
driver.find_element_by_xpath("//*[contains(#data-parameters, '8da1d6a9-44d1-4556-bc12-92699749a30a')]").click()
or
driver.find_element_by_xpath("//*[contains(#data-parameters, '102086182829)]").click()
you should filter by the "data-parameters" attribute.
Try
driver.find_element_by_xpath("//span[contains(#data-parameters, '102086182829')]").click()
This is a dirty implementation of what you need. It would be better to extract the data-parameters field, deserialize JSON and check for the needed field;
spans = driver.find_element_by_xpath("//span[#class='contains-icon-details gc-btn gc-btn--s']")
for span in spans:
data_parameters = span.get_attribute("data-parameters")
try:
data_parameters = json.loads(data_parameters)
except:
continue
if 'tnId' in data_parameters and data_parameters['tnId'] == "102086182829":
span.click()
break

Selenium Python - Store XPath in var and extract depther hirachy XPath from var

I sadly couldn't find any resources online for my problem. I'm trying to store elements found by XPath in a list and then loop over the XPath elements in a list to search in that object. But instead of searching in that given object, it seems that selenium is always again looking in the whole site.
Anyone with good knowledge about this? I've seen that:
// Selects nodes in the document from the current node that matches the selection no matter where they are
But I've also tried "/" and it didn't work either.
Instead of giving me the text for each div, it gives me the text from all divs.
My Code:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# I'm looking for all divs with a specific class and store them in a list
divs_found = driver.find_elements_by_xpath("//div[#class='a-fixed-right-grid-col a-col-left']")
# Here seems to be the problem as it seems like instead of "divs_found[1]" it behaves like "driver" an looking on the whole site
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
# Now I'm looking in the found href matches to store the text from it
for href in hrefs_matching_in_div:
result_text.append(href.text)
print(result_text)
You need to add . for immediate child.Try now.
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath(".//a[contains(#href, '/gp/product/')]")

Want to extract decimal number from a page with xpath, selenium wedriver in python

I have a page having item price as shown in attached image. i want to extract this price as 64.99. I want to ask what would be the xpath to get this number as Im using selenium webdriver to find this price
I have tried a lot of permutations of xpaths but the problem is that this page have a lot such products so its being difficult to find unique xpath of that price. e.g -
//li[#class = 'price-current'] (gives 13 result on the page)
//*[#id = 'landingpage-price' and #class = 'price-current'] (give no result)
Any help will be appreciated. Thanks
Since you mentioned there are lot of such products, then the problem you are asking is wrong. You need to find out how to get to the product that you are interested in and then find its price. You are trying to find the price directly.
Now the issue in below xpath
//*[#id = 'landingpage-price' and #class = 'price-current'] (give no result)
is that, you are trying to search inside landingpage-price and specifying the class condition also on the container element. First I would suggest do this using css, but I will show both xpath and css as well.
XPath
elem = driver.find_element_by_xpath("//div[#id = 'landingpage-price']//li[#class = 'price-current']")
print (elem.text.replace("$",""))
CSS
elem = driver.find_element_by_css_selector("#landingpage-price .price-current")
print (elem.text.replace("$",""))
You xpath would break if developers adds more classes to the price. So using a css is better and it does work also. As you can see in below image it uniquely identified the element

XPath: Assessing an Error in this Line of Code?

I recently began learning XPath for a Python project, but I can't seem to get the following line selecting the correct piece of data.
//table[#id="yfncsumtab"]//tr/td/a[#rel="first"]
Said data is found on this page:http://finance.yahoo.com/q/hp?s=QQQX+Historical+Prices
(Inspect Element the "Next" link to get to the code I'm attempting to create an XPath to. In other words, Command/Control F on that page, and Inspect Element the first result)
I've tried many variations of that code, but none seem to select the proper text. I appreciate any and all help - thanks in advance!
'//a[text()="Next"]'
or:
'//table[#id = "yfncsumtab"]//a[text()="Next"]'
or, to get just the first one:
'//table[#id = "yfncsumtab"]//table[1]/tr/td/a[text()="Next"]'
or:
'//table[#id="yfncsumtab"]/tr[2]/td[1]/table[1]/tr/td/a[1]'
The more specific you are, the faster it is to find the element. However, the more specific you are, the more brittle the xpath is: if the developers make a small change in the html structure surrounding the target element, your code won't work.
from lxml import html
doc = html.parse("http://finance.yahoo.com/q/hp?s=QQQX+Historical+Prices")
my_xpath = '//a[text()="Next"]'
for element in doc.xpath(my_xpath):
print("<{}>".format(element.tag))
print(" text = {}".format(element.text))
for attr, val in element.items():
print(" {} = {}".format(attr, val))
--output:--
<a>
text = Next
rel = next
href = /q/hp?s=QQQX&d=11&e=28&f=2014&g=d&a=1&b=1&c=2007&z=66&y=66
<a>
text = Next
rel = next
href = /q/hp?s=QQQX&d=11&e=28&f=2014&g=d&a=1&b=1&c=2007&z=66&y=66
Try this one:
//*[(#id = "yfncsumtab")]//a[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]
With this Xpath I get both the top and the bottom 'Next' link.

Categories