How to locate text within same xpath?
I used but not work:
//div[contains(text(),"Review") and contains(text(),"received"]
The texts belong to two different tags. You can look for element with "Review" text which has a child with "received" text
//div[contains(text(),"Review") and div[contains(text(),"received")]]
Take care, you are mission the closer ')'
//div[contains(text(),"Review") and contains(text(),"received")]
But this is not the good xpath, cause "received" is on inner element
Try this, .//* means any child element, can use ./div on second contains
//div[.//*[contains(text(),"Review")] and .//*[contains(text(),"received")]]
or
//div[contains(text(),"Review") and .//*[contains(text(),"received")]]
Related
I'm trying to select and copy all the text in a div, except H1 tag. The text looks as shown image.
To select all content, I could do by using the below code
browser.find_element(By.XPATH, "//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div").send_keys(Keys.CONTROL + "a") #select all
But when I try to avoid the H1 tag content, which is "Do You Have Dry Pack...", with it's corresponding div which is
"//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div/h1"
with the code like below, it's showing error
browser.find_element(By.XPATH, "//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div"[not(("//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div/h1"))]).send_keys(Keys.CONTROL + "a") #select all
The error is
How can I overcome this?
Firstly, you can use // to Selects nodes in the document from the current node that match the selection no matter where they are. XPath Syntax
Secondly, about your error, you define a string in JS with double quotes, but there are double quotes in your string. Therefore, you have to escape those with character \. For e.g:
browser.find_element(By.XPATH, "//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div\"[not((\"//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div/h1"))]).send_keys(Keys.CONTROL + "a")
Lastly, instead of using element's index (e.g: div[2]), you should find another way to define the unique path to that element. Element's id, class, attributes are commonly used. In case that didn't work, you can use // to reach the child nodes, then use .. to reach their parent.
With that xpath you are getting the entire div, but when you put the "not" you are telling: "take me all the divs that hasn't a h1 inside". But you want to exclude only the h1 so you need to take the elements INSIDE THE DIV, not the div. You can do that with this xpath:
/div/*[not(self::h1)]
//*[#id='__next']/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div[2]/div/*[not(self::h1)]
Then in the code you have to get this as a list, not a single element.
And please, try to create more clean and less dangerous xpaths as the other answer said, something like: //*[#id='__next']//h1 it's better.
I am scraping this webpage and while trying to extract text from one element, I am hitting a dead end.
So the element in question is shown below in the image -
The text in this element is within the <p> tags inside the <div>. I tried extracting the text in the scrapy shell using the following code - response.css("div.home-hero-blurb no-select::text").getall(). I received an empty list as the result.
Alternatively, if I try going a bit further and reference the <p> tags individually, I can get the text. Why does this happen? Isn't the <div> a parent element and shouldn't my code extract the text?
Note - I wanted to use the div because I thought that'll help me get both the <p> tags in one query.
I can see two issues here.
The first is that if you separate the class name with spaces, the css selector will understand you are looking for a child element of that name. So the correct approach is "div.home-hero-blurb.no-select::text" instead of "div.home-hero-blurb no-select::text".
The second issue is that the text you want is inside a p element that is a child of that div. If you only select the div, the selector will return the text inside the div, but not in it's childs. Since there is also a strong element as child of p, I would suggest using a generalist approach like:
response.css("div.home-hero-blurb.no-select *::text").getall()
This should return all text from the div and it's descendants.
It's relevant to point out that extracting text from css selectors are a extension of the standard selectors. Scrapy mention this here.
Edit
If you were to use XPath, this would be the equivalent expression:
response.xpath('//div[#class="home-hero-blurb no-select"]//text()').getall()
I have the following:
This is my text string and this next <a href='https//somelink.org/'>part</a> is only partially enclosed in a tags.
In the above string i have to search for "next part" not only "part" so once i find the "next part" I need to check if there is any a tag present in the matched text (sometimes there is not an tag) - how can I do that?
Additional to my main question I can't make my xpath to work to find "next part" in the elements.
I tried this:
//*[contains(text(),"next part")]
But it doesn't find anything probably because I have spaces in there - how do I overcome this?
Thank you in advance,
Let's assume this html:
<p>This is my text string and this next <a href='https//somelink.org/'>part</a> is only partially enclosed in a tags.</p>
We can select with selenium:
p = driver.find_element_by_xpath('//p[contains(.,"next part")]')
And we can determine if it's partly in an a tag with regex (Tony the Pony notwithstanding):
html = p.get_attribute('innerHTML')
partly_in_a = 'next part' in re.sub(r'</?a.*?>', '', html) and 'next part' not in html
There's no pure xpath 1.0 solution for this, and it's a mistake in general to depend on xpath for stuff like this.
You'll need to use a nested XPath selector for this.
//*[contains(text(), 'next') and a[contains(text(), 'part')]]
This will query on any element that contains text next, then also check that the element contains nested a element with text part.
To determine whether or not there actually IS a nested a tag, you will need to write a method for this that checks against two different XPaths. There is no easy way around this, other than to evaluate the elements and see what's there.
public bool DoesElementHaveNestedTag()
{
// check for presence of locator with nested tag
// if driver.findElements returns > 0, then nested tag locator exists
if (driver.findElements(By.XPath("//*[contains(text(), 'next') and a[contains(text(), 'part')]]")).Count > 0) return true
else return false
}
You can change this method to fit your needs, but the idea is the same. There is no way to know if a WebElement has a nested tag or not, unless you try to find the WebElement using two XPaths -- one that checks for the tag, and one that does not.
I am trying to find the input type with statusid_103408 and with text() Draft
here is the xpath i am using, not sure where I am going wrong
//input[#name='statusid_103408' and contains(text(), 'Draft')]
The reason this xpath does not work is because the text of "Draft" is not actually a property of the input element. It is contained in the li element that is the parent. Therefore, your search is returning no results.
I suggest just using the name only in your xpath search (if it unique). If you definitely need the text in your search, you can search the li item's text first, then find your input, like so:
//li[text()='Draft']/input[#name='statusid_103408']
Use Value it will work , because value is unique, text is not inside the input tag!
Having such xml file. How can I select only that tag, which href attribute ends with parent, like third element below.
Determine it by position like
elem = tree.findall('{*}CustomProperty')[2]
does not fit because some documents might have only one parent href, others 5-10 and third might not have such hrefs at all.
I tend to use xpath but can not figure out how can I tell xpath to search for end of attribute match.
Also xpath is not must, I will be glad to use any way that fits to my purpose
So how can I get CustomProperty element which has a href attribute that ends with word parent ?
<CustomProperty href="urn:1653267:643562dafewq:cs:46wey5ge:234566">urn:1653267:643562dafewq:cs:46wey5ge:234566:ss</CustomProperty>
<CustomProperty href="urn:1653267:643562dafewq:cs:46wey5ge:234566">urn:1653267:643562dafewq:cs:46wey5ge:234566:ss</CustomProperty>
<CustomProperty href="urn:1653267:643562dafewq:cs:46wey5ge:234566:parent">urn:1653267:643562dafewq:cs:46wey5ge:234566:ss</CustomProperty>
Thank you in advance for help
Try using the contains selector to find the element with an attribute href which contains the word parent
//*[contains(#href, 'parent')]
or if you are sure about the position of text "parent" you can use the ends-with
//*[ends-with(#href, 'parent')]
Does
//CustomProperty[contains(#href, 'parent') and substring-after(#href, 'parent') = '']
cater to your requirements? One issue with the suggestion is that it fails for href attributes where parent occurs more than once.
If your xpath processor supports xpath 2.0, use aberna's suggestion.
Remember to replace the '//' axis by specific paths whereever possible for performance reasons.