xpath doesn't work in this website - python

I am scraping individual listing pages from justproperty.com (individual listing from the original question no longer active).
I want to get the value of the Ref
this is my xpath:
>>> sel.xpath('normalize-space(.//div[#class="info_div"]/table/tbody/tr/td[norma
lize-space(text())="Ref:"]/following-sibling::td[1]/text())').extract()[0]
This has no results in scrapy, despite working in my browser.

The following works perfectly in lxml.html (with modern Scrapy uses):
sel.xpath('.//div[#class="info_div"]//td[text()="Ref:"]/following-sibling::td[1]/text()')
Note that I'm using // to get between the div and the td, not laying out the explicit path. I'd have to take a closer look at the document to grok why, but the path given in that area was incorrect.

Don't create XPath expression by looking at Firebug or Chrome Dev Tools, they're changing the markup. Remove the /tbody axis step and you'll receive exactly what you're look for.
normalize-space(.//div[#class="info_div"]/table/tr/td[
normalize-space(text())="Ref:"
]/following-sibling::td[1]/text())
Read Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? for more details.

Another XPath that gets the same thing: (.//td[#class='titles']/../td[2])[1]
I tried your XPath using XPath Checker and it works fine.

Related

Get Xpath for Element in Python

I've been researching this for two days now. There seems to be no simple way of doing this. I can find an element on a page by downloading the html with Selenium and passing it to BeautifulSoup, followed by a search via classes and strings. I want to click on this element after finding it, so I want to pass its Xpath to Selenium. I have no minimal working example, only pseudo code for what I'm hoping to do.
Why is there no function/library that lets me search through the html of a webpage, find an element, and then request it's Xpath? I can do this manually by inspecting the webpage and clicking 'copy Xpath'. I can't find any solutions to this on stackoverflow, so please don't tell me I haven't looked hard enough.
Pseudo-Code:
*parser is BeautifulSoup HTML object*
for box in parser.find_all('span', class_="icon-type-2"): # find all elements with particular icon
xpath = box.get_xpath()
I'm willing to change my code entirely, as long as I can locate a particular element, and extract it's Xpath. So any other ideas on entirely different libraries are welcome.

Python Selenium Find Element

I'm searching for a tag in class.I tried many methods but I couldn't get the value.
see source code
The data I need is inside the "data-description".
How can i get the "data-description" ?
I Tried some method but didn't work
driver.find_element_by_name("data-description")
driver.find_element_by_css_selector("data-description")
I Solved this method:
icerisi = browser.find_elements_by_class_name('integratedService ')
for mycode in icerisi:
hizmetler.append(mycode.get_attribute("data-description"))
Thanks for your help.
I think css selector would work best here. "data-description" isn't an element, it's an attribute of an element. The css selector for an element with a given attribute would be:
[attribute]
Or, to be more specific, you could use:
[attribute="attribute value"]
Here's a good tip:
Most web browsers have a way of copying an elements Selector or XPATH. For example, in Safari if you view the source code then right-click on an element it will give you the option to copy it. Then select XPATH or Selector and in your code use driver.find_element_by_xpath() or driver.find_element_by_css_selector(). I am certain Google Chrome and Firefox have similar options.
This method is not always failsafe, as the XPATH can be very specific, meaning that slight changes to the website will cause your script to crash, but it is a quick and easy solution, and is especially useful if you don't plan on reusing your code months or years later.

Selecting the first link in a google search

When I inspect the website(google search), I'm able to select my desired href by searching for this //div[#class="r"]/a/#href through the finder. But when using scrapy and accessing by response.xpath('//div[#class="r"]/a/#href') this will return empty. Many other Xpath such as link title will also result empty. Strangely enough, I'm able to get something when using response.xpath('//cite').get(), which is basically the href but incomplete.
If I do response.body I'm able to see my desired href deep into the code but I have no idea how to access it. Trying to select it through traditional methods css or xpath that would work in any other website has been futile.
The reason the xpath you're using work on your browser but no in the response, is because Google displays the page differently if JS is disabled, which is the case for scrapy but not your browser, so you'll need to use an XPath that will work for both or just the first case.
This one works for no JS but won't work in the browser (if JS is enabled):
//div[#id='ires']//h3/a[1]/#href
This will return the first URL of the first result.
Try the below.
response.xpath("//div[#class='r']").xpath("//a/#href").extract()

Find elements with specific class name

for chrome, I install ChroPath to find elements on the page.
I want to find XPath for like elements on Instagram Page, but seems that not work :
//span[contains(#class,'glyphsSpriteHeart__outline__24__grey_9 u-__7')]
also, I try it :
/html[1]/body[1]/div[3]/div[1]/div[2]/div[1]/article[1]/div[2]/section[1]/span[1]/button[1]/span[1]
when selenium click :
elenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div._2dDPU.vCf6V div.zZYga div.PdwC2._6oveC article.M9sTE.L_LMM.JyscU div.eo2As section.ltpMr.Slqrh span.fr66n button.coreSpriteHeartOpen.oF4XW.dCJp8 > span.glyphsSpriteHeart__outline__24__grey_9.u-__7"}
how can I find XPath? any good extension or something?
how can I find XPath? any good extension or something?
You cannot "find" the Xpath of an element. There are many, many XPath's that will find any element. Some will be stable, others will be unstable. The decision on which Xpath to use is based upon your understanding and experience of Selenium, and you understanding of how the Application Under Test is written and behaves.
If you are looking for a tool to experiment with different XPaths, then Chrome's built-in Developer Tools Console allows you to test both Xpath & CSS Selectors;
In your specific scenario about finding elements by class name, then CSS Selector is a much better choice than XPath as CSS selectors will treat multiple classes as an array where as XPath sees "class" as a literal string, hence why you needed to use "contains".
This might help:
https://selectorgadget.com/
This as well, to understand what you are manipulating:
https://www.w3schools.com/xml/xpath_syntax.asp
As for your example where you go down the tree using index numbers (ie: /html[1]/body[1]), A slight change in the site will make your script to fail. Find a way to build something more robust! Also have a look at CSS selectors if you object's appearance is known in advance.
To get all like buttons on instagram use css selector below:
span[aria-label="Like"]
You can get some helpful details here: https://www.w3schools.com/cssref/css_selectors.asp

Scrapy Python Web Scraping - creating XPath

I am trying to create "universal" Xpath, so when I run spider, it will be able to download the hotel name for each hotel on the list.
This is the XPath that I need to convert:
//*[#id="offerPage"]/div[3]/div[1]/div[1]/div/div/div/div/div[2]/div/div[1]/h3/a
Can anyone point me the right direction?
This is the example how they did it in the scrapy docs:
https://github.com/scrapy/quotesbot/blob/master/quotesbot/spiders/toscrape-xpath.py
for text: they have :
'text': quote.xpath('./span[#class="text"]/text()').extract_first(),
When you open "http://quotes.toscrape.com/" and copy Xpath for text you will get :
/html/body/div/div[2]/div[1]/div[1]/span[1]
When you look at the html that your are scraping just using "copy xpath" from the browser source viewer is not enough.
You need to look at the attributes that the html tags have.
Of course, using just tag types as an xpath can work, but what if not every page you are going to scrape follows that pattern?
The Scrapy example you are using uses the span's class attribute to precisely point to the target tag.
I suggest reading a bit more about Xpath (for example here) to understand how flexible your search patterns can be.
If you want to go even broader, reading about DOM structure will also be useful. Let us know if you need more pointers.

Categories