Can't Find Suitable Xpath or CSS Selector - python

Anyone know how to select this via xpath or css selector? The class name is used elsewhere in the HTML markup. Also this div isn't always present in each information block.
I need the following output in this div:
Price: $15.77 (13% off)
Here's the link if you need to see the source code in more detail: https://www.amazon.com/gp/goldbox/ref=nav_cs_gb

You can get required output as
driver.find_element_by_xpath('//div[normalize-space(span)="Price:"]').text

Related

InvalidSelectorException Error while trying to get text from div class in Selenium Python

I'm trying to get text using Selenium WebDriver and here is my code. Please note that I don't want to use XPath, because in my case the ID gets changed on every relaunch of the web page.
My code:
driver.find_element_by_class_name("05uR6d").text
HTML:
<div class="O5uR6d">to fasten stuff</div>
Error:
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified (Session info: chrome=88.0.4324.150)
Error is specific to the line of code I mentioned above.
How can I fix this?
Use this xpath:
driver.find_element_by_xpath("//div[contains(text(),'to fasten stuff')]")
Or this CSS:
driver.find_element_by_css_selector(".O5uR6d")
If both won't work, improve your question by adding more data of HTML you are looking at.
It can be done using multiple ways let me try to explain most of them.
Get element by class name.
this is the most easiest solution to get any element by class name you can simply do is:
driver.find_element_by_class_selector('foo');
Get Element by xpath
This is a bit tricky one, you can apply xpath either the class name, title, id or whatever remains same. it also works even if there's a text inside your div. For example:
driver.find_element_by_xpath("//tagname[#attribute='value']")
or in your case:
driver.find_element_by_xpath("//div['class='O5uR6d']")
or you can do something like #vitaliis said
driver.find_element_by_xpath("//div[contains(text(),'to fasten stuff')]")
You can read more about xpath and how to find it on this link
Get Elements by ID:
You can also get the element from id if there's any that's static:
driver.find_element_by_id('baz')
Get Elements by Name:
Get Elements by name using the following syntax:
driver.find_element_by_name('bazz')
Using CSS Selectors:
You can also use the css selectors to find the elements. Consider a following tag that has some attributes:
<p class="content">Site content goes here.</p>
You can get this element by:
driver.find_element_by_css_selector('p.content')
You can read more about it over here

Empty list as output from scrapy response object

I am scraping this webpage and while trying to extract text from one element, I am hitting a dead end.
So the element in question is shown below in the image -
The text in this element is within the <p> tags inside the <div>. I tried extracting the text in the scrapy shell using the following code - response.css("div.home-hero-blurb no-select::text").getall(). I received an empty list as the result.
Alternatively, if I try going a bit further and reference the <p> tags individually, I can get the text. Why does this happen? Isn't the <div> a parent element and shouldn't my code extract the text?
Note - I wanted to use the div because I thought that'll help me get both the <p> tags in one query.
I can see two issues here.
The first is that if you separate the class name with spaces, the css selector will understand you are looking for a child element of that name. So the correct approach is "div.home-hero-blurb.no-select::text" instead of "div.home-hero-blurb no-select::text".
The second issue is that the text you want is inside a p element that is a child of that div. If you only select the div, the selector will return the text inside the div, but not in it's childs. Since there is also a strong element as child of p, I would suggest using a generalist approach like:
response.css("div.home-hero-blurb.no-select *::text").getall()
This should return all text from the div and it's descendants.
It's relevant to point out that extracting text from css selectors are a extension of the standard selectors. Scrapy mention this here.
Edit
If you were to use XPath, this would be the equivalent expression:
response.xpath('//div[#class="home-hero-blurb no-select"]//text()').getall()

Using Scrapy Python not able to extract data from response html with xpath due to namespace

I am using scrapy with xpath to extract data from a webpage. My html response looks like this,
I want to extract the href link present in the highlighted "a" tag.
Usually I use response.xpath('//a[#id="jr-alt-sw"]/#href') to get the data, but here I think due to the namespace problem the result is empty. How can I get the data if namespace is present.
Any help is appreciated!!
Is that true about namespace? Another reason to use css instead:
response.css('a#jr-alt-sw::attr(href)')
There is no href attribute available for the selected a tag here, Check out for the next a tag that contains the href attribute.
response.xpath('//a[#id="jr-pdf-sw"]/#href')

How to get href link from in this a tag?

I successfully get href link from http://quotes.toscrape.com/ example by implementing:
response.css('div.quote > span > a::attr(href)').extract()
and it gives all partial link inside href of each a tag:
['/author/Albert-Einstein', '/author/J-K-Rowling', '/author/Albert-Einstein', '/author/Jane-Austen', '/author/Marilyn-Monroe', '/author/Albert-Einstein', '/author/Andre-Gide', '/author/Thomas-A-Edison', '/author/Eleanor-Roosevelt', '/author/Steve-Martin']
by the way in above example each a tag has this format:
(about)
So I tried to make the same for this site: http://www.thegoodscentscompany.com/allproc-1.html
The problem here is that the style of a tag is a bit different as such:
formaldehyde
As you see I can't get link from href by using similar method above. I want to get link (http://www.thegoodscentscompany.com/data/rw1247381.html) from this a tag, but i could not make it. How can i get this link?
Try this response.css('a::attr(onclick)').re(r"Window\('(.*?)'\)")

What is the XPath for the following element?

I need to find the XPath for the following element below:
<a> href="/resident/register/">Register another device </a>
I assumed the solution would be
$x("//*[contains(#href, 'resident/register')]")
But this has returned nothing. Any ideas?
Your HTML is malformed.
Change
<a> href="/resident/register/">Register another device </a>
to
Register another device
then your XPath will work as expected.
If your HTML is fixed, then you'll have to adjust your XPath to test the element content rather than the href attribute content:
//a[contains(.,'resident/register')]
but, although this can select the malformed a element, it won't be clickable since it lacks a proper href attribute.
To start with #kjhughes nailed one of the issue with the HTML i.e. HTML is malformed, as you must have tried to provide a correct HTML from your understanding.
The actual HTML I suppose is as follows:
Register another device
So an effective XPath for the element can be either of the following:
XPath:1
"//a[contains(#href, '/resident/register') and contains(.,'Register another device')]"
XPath:2
"//a[contains(#href, '/resident/register') and normalize-space()='Register another device']"

Categories