I recently began learning XPath for a Python project, but I can't seem to get the following line selecting the correct piece of data.
//table[#id="yfncsumtab"]//tr/td/a[#rel="first"]
Said data is found on this page:http://finance.yahoo.com/q/hp?s=QQQX+Historical+Prices
(Inspect Element the "Next" link to get to the code I'm attempting to create an XPath to. In other words, Command/Control F on that page, and Inspect Element the first result)
I've tried many variations of that code, but none seem to select the proper text. I appreciate any and all help - thanks in advance!
'//a[text()="Next"]'
or:
'//table[#id = "yfncsumtab"]//a[text()="Next"]'
or, to get just the first one:
'//table[#id = "yfncsumtab"]//table[1]/tr/td/a[text()="Next"]'
or:
'//table[#id="yfncsumtab"]/tr[2]/td[1]/table[1]/tr/td/a[1]'
The more specific you are, the faster it is to find the element. However, the more specific you are, the more brittle the xpath is: if the developers make a small change in the html structure surrounding the target element, your code won't work.
from lxml import html
doc = html.parse("http://finance.yahoo.com/q/hp?s=QQQX+Historical+Prices")
my_xpath = '//a[text()="Next"]'
for element in doc.xpath(my_xpath):
print("<{}>".format(element.tag))
print(" text = {}".format(element.text))
for attr, val in element.items():
print(" {} = {}".format(attr, val))
--output:--
<a>
text = Next
rel = next
href = /q/hp?s=QQQX&d=11&e=28&f=2014&g=d&a=1&b=1&c=2007&z=66&y=66
<a>
text = Next
rel = next
href = /q/hp?s=QQQX&d=11&e=28&f=2014&g=d&a=1&b=1&c=2007&z=66&y=66
Try this one:
//*[(#id = "yfncsumtab")]//a[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]
With this Xpath I get both the top and the bottom 'Next' link.
Related
Here's my code:
def start_server(server_ip = None):
content = driver.page_source
root = html.fromstring(content)
tree = root.getroottree()
result = root.xpath('//*[. = "MythicalNoobsSMP"]')
print(root,tree,result,end="\n")
start_server('MythicalNoobsSMP')
What I basically want to find is from aternos.org, I want to open the server profile requested by the function's param. Like here are the list of my servers:-
I want to open the MythicalNoobsSMP server basically so how do I get it's XPATH? (Note, I don't want to get it from the dev tools → inspect)
So you already have this right?
result = root.xpath('//*[. = "MythicalNoobsSMP"]')
Once you need to check what kind of element it is. If its a span or an a.
Then basically you can create a generic xpath by saying
//elementname[text()='Your specific text']
For instance, to find a span with the value "MythicalNoobsSMP":
//span[text()='MythicalNoobsSMP']
This will work for all future references
I sadly couldn't find any resources online for my problem. I'm trying to store elements found by XPath in a list and then loop over the XPath elements in a list to search in that object. But instead of searching in that given object, it seems that selenium is always again looking in the whole site.
Anyone with good knowledge about this? I've seen that:
// Selects nodes in the document from the current node that matches the selection no matter where they are
But I've also tried "/" and it didn't work either.
Instead of giving me the text for each div, it gives me the text from all divs.
My Code:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# I'm looking for all divs with a specific class and store them in a list
divs_found = driver.find_elements_by_xpath("//div[#class='a-fixed-right-grid-col a-col-left']")
# Here seems to be the problem as it seems like instead of "divs_found[1]" it behaves like "driver" an looking on the whole site
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
# Now I'm looking in the found href matches to store the text from it
for href in hrefs_matching_in_div:
result_text.append(href.text)
print(result_text)
You need to add . for immediate child.Try now.
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath(".//a[contains(#href, '/gp/product/')]")
I am having A LOT of trouble getting elements in the webpage using xpath. I need to get the text on the left as well as the right. There are 7 classes, so there will be 7 of these.
Within these divs this is what it looks like
I just need the text which corresponds to the first photo.
Below is just ONE attempt
result = session_requests.get(url, headers = dict(referer = url))
tree = html.fromstring(result.content)
grades = tree.xpath(".//div[#class='AssignmentClass'][1]//text()")
print grades
Xpath is so powerful because it's a syntax for describing both a daatapath and the data itself.
In this case, you would end your path with text() because that's what you want.
tree.xpath(".//div[#class='AssignmentClass'][1]//text()")
You can use the following XPath
To get the text from the <a> tags:
//div[#class='AssignmentClass']//a/text()
To get the text from the <span> tags:
//div[#class='AssignmentClass']//span[2]/text()
I am using python with xpath and am getting lost in the xpath syntax. What I would like to do is to check if there is not a tag in a table in an html page. So I am using xpath to do this. Then, if there isn't this tag, do an xpath search relative to the section. I seem to be getting something working, but it kind of does the reverse and can't figure out why. Example code is below.
main_sections = tree.xpath('//td[#class="cars"]')
for i in range(0, len(main_sections)):
has_no_flag = True
for c in main_sections[i].getchildren():
if c.tag == "span" and c.get("class") == "colorRed":
has_no_flag = False
if has_no_flag:
price = main_sections[i].xpath('//td[#class="cars"]/following-sibling::td[#class="price"]/span[#class="amount-value"]')
price_str = price[0].text.strip()
I don't think the xpath is correct for price. Hopefully someone will be able to enlighten me :)
I don't think you're using XPath correctly here.
Just filter the nodes you want to have and throw out your own loops and flags.
cars_without_tag_price = '''//td[
#class="cars" and not(span[#class="colorRed"])
]/following-sibling::td[#class="price"]/span[#class="amount-value"]
'''
for price_node in tree.xpath(cars_without_tag_price):
price_str = price_node.text.strip()
I wish to get text from a html page using XPath.
The particular text is in the td to right of Description: (inside th element) from the url in the source.
In the first call (commented out) I have tried absolute path from XPath taken from Chrome inspector but I get an empty list.
The next call works and gives the heading:
"Description:"
I require a generic XPath query that would take a text heading (like "Description:") and give text value of the td next to it.
url = 'http://datrack.canterbury.nsw.gov.au/cgi/datrack.pl?cmd=download&id=ZiFfLxV6W1xHWBN1UwR5SVVSAV0GXUZUcGFGHhAyTykQAG5CWVcARwM='
page = requests.get(url)
tree = html.fromstring(page.content)
# desc = tree.xpath('//*[#id="documentpreview"]/div[1]/table[1]/tbody/tr[2]/td//text()')
desc = tree.xpath("//text()[contains(., 'Description:')]")
I have tried variations of XPath queries but my knowledge is not deep enough.
Any help would be appreciated.
Use //*[contains(text(), 'Description:')] to find tags whose text contains Description:, and use following-sibling::td to find following siblings which are td tags:
In [180]: tree.xpath("//*[contains(text(), 'Description:')]/following-sibling::td/text()")
Out[180]: ['Convert existing outbuilding into a recreational area with bathroom and kitchenette']