XPath for anchor element not in certain parent element? - python

Using xpath, how can I get all anchor tags except the ones in italics from the second paragraph? (Question and example has been simplified. Imagine a regular HTML page with multiple <p> and <a>).
<html><body>
<p>
A
<b>B</b>
<i>C</i>
</p>
<p>
<b>E</b>
F
<i>G</i>
</p>
</body></html>
Should get:
<a href="e.html">
<a href="f.html">
What I have:
root.xpath('//body//p')[1].xpath('a[not(self::i)]')
I am only getting:
`<a href="f.html">`

Try below XPath to get required output:
//p[2]//a[not(parent::i)]

As #Andersson commented, it's unclear where your a elements are supposed to end.
Assuming that your a elements are meant to be self-closing,
<html><body>
<p>
<a href="a.html"/>
<b><a href="b.html"/></b>
<i><a href="c.html"/></i>
</p>
<p>
<b><a href="e.html"/></b>
<a href="f.html"/>
<i><a href="g.html"/></i>
</p>
</body>
</html>
Then this XPath,
/html/body/p[2]//a[not(parent::i)]
selects all of the a descendents of the second paragraph whose parent is not an i element:
<a href="e.html"/>
<a href="f.html"/>
Credit: Thanks to #Andersson for a correction. Please upvote his answer. Thanks.

Related

Second element on selenium chromedriver

When trying to automate our application, there are two div with same parameters.
I'm not able to find a way to recognize these. Please let me know what could be the other ways to identify these elements in selenium chromedriver in python.
I tried it, no it's doesn't work for me.
driver.find_element_by_xpath("(//div[#class='c3'])[2]/p").send_keys('text')
This is my html code
<div class="c3">
<p> test1 </p>
</div>
<div class="c3">
<p> test2 </p>
</div>
<div class="c3">
<p> test3 </p>
</div>
I want add my text after test2
Problem is the extra parentheses() / parentheses at wrong place in your xpath.
Correct xpath would be:
driver.find_element_by_xpath("//div[#class='c3'][2]/p").send_keys('text')
OR
driver.find_element_by_xpath("(//div[#class='c3']/p)[2]").send_keys('text')

Python: XPATH search within node

I have a html code that looks kind of like this (shortened);
<div id="activities" class="ListItems">
<h2>Standards</h2>
<ul>
<li>
<a class="Title" href="http://www.google.com" >Guidelines on management</a>
<div class="Info">
<p>
text
</p>
<p class="Date">Status: Under development</p>
</div>
</li>
</ul>
</div>
<div class="DocList">
<h3>Reports</h3>
<p class="SupLink">+ <a href="http://www.google.com/test" >View More</a></p>
<ul>
<li class="pdf">
<a class="Title" href="document.pdf" target="_blank" >Document</a>
<span class="Size">
[1,542.3KB]
</span>
<div class="Info">
<p>
text <a href="http://www.google.com" >Read more</a>
</p>
<p class="Date">
14/03/2018
</p>
</div>
</li>
</ul>
</div>
I am trying to select the value in 'href=' under 'a class="Title"' by using this code:
def sub_path02(url):
page = requests.get(url)
tree = html.fromstring(page.content)
url2 = []
for node in tree.xpath('//a[#class="Title"]'):
url2.append(node.get("href"))
return url2
But I get two returns, the one under 'div class="DocList"' is also returned.
I am trying to change my xpath expressions so that I would only look within the node but I cannot get it to work.
Could someone please help me understand how to "search" within a specific node. I have gone through multiple xpath documentations but I cannot seem to figure it out.
Using // you are already selecting all the a elements in the document.
To search in a specific div try specifying the parent with // and then use //a again to look anywhere in the div
//div[#class="ListItems"]//a[#class="Title"]
for node in tree.xpath('//div[#class="ListItems"]//a[#class="Title"]'):url2.append(node.get("href"))
Try this xpath expression to select the div with a specific id recursively :
'//div[#id="activities"]//a[#class="Title"]'
so :
def sub_path02(url):
page = requests.get(url)
tree = html.fromstring(page.content)
url2 = []
for node in tree.xpath('//div[#id="activities"]//a[#class="Title"]'):
url2.append(node.get("href"))
return url2
Note :
It's ever better to select an id than a class because an id should be unique (in real life, there's sometimes bad code with multiple same id in the same page, but a class can be repeated N times)

scrapy xpath how to use?

guys,
I have a question, scrapy, selector, XPath
I would like to choose the link in the "a" tag in the last "li" tag in HTML, and how to write the query for XPath
I did that, but I believe there are simpler ways to do that, such as using XPath queries, not using list fragmentation, but I don't know how to write
from scrapy import Selector
sel = Selector(text=html)
print sel.xpath('(//ul/li)').xpath('a/#href').extract()[-1]
'''
html
'''
</ul>
<li>
<a href="/info/page/" rel="follow">
<span class="page-numbers">
35
</span>
</a>
</li>
<li>
<a href="/info/page/" rel="follow">
<span class="next">
next page.
</span>
</a>
</li>
</ul>
I am assuming you want specifically the link to the "next" page. If this is the case, you can locate an a element checking the child span to the "next" class:
//a[span/#class = "next"]/#href

Searching text knowing <i tag class

I need to get div text with class _50x4 using 5pxsel:
<div...>
<i class="5pxsel">
<div>
<div>
<div class="_50x4">
Work in
<a>London</a>
<div class="_50x4">
Work in
<a> Germany </a>
I need to get text using class 5pxsel, not _50x4, and get only first result - 'Work in London'.
trt with following x-path
//*[#class="5pxsel"]/following-sibling::div/div/div[#class='_50x4']

handling deeply nested tags in xpath

Please help me!
I don't know how to select deeply nested tag to select the text
inside of it.
If someone would please help me by saying, how to do it in a single line with
xpath query and please give me an explanation regarding the answer.
Below I have given a html code will anybody explain how to display the Hello world or whatever may be in that tags.
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div class="deep">
<span>
<strong class="select">Hello world!</strong>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
I assume since you asked for the text property the node you'd like to match is the strong tag (the only one with content).
If you are guaranteed only one <strong> tag from the document root and the level of nesting is irrelevant, the simplest xpath would be:
//strong/text()
To match via class specifically as well:
//strong[#class="select"]/text()
// will start from the document root, and # is an attribute match clause.
http://www.b624.net/modelare-software-uml-si-xml/laboratoare-an-3-is/xpath-cheat-sheet

Categories