I'm trying to scrap a website using selenium.
I tried using XPATH, but the problem is that the rows on the website change over time...
How can I scrap the website, so that it gives me the output '21,73' ?
<div class="weather_yesterday_mean">21,73</div>
You can just use querySelector that accepts CSS selectors. I personally like them way more than XPath:
elem = driver.find_element_by_css_selector('div.weather_yesterday_mean')
result = elem.text
If that suits you, please read a bit about CSS selectors, for example here: https://www.w3schools.com/cssref/css_selectors.asp
Related
I'm trying to download two fields from a webpage, I identify the XPath expressions for each one and then run the spider, but nothing is downloaded.
The webpage:
http://www.morningstar.es/es/funds/snapshot/snapshot.aspx?id=F0GBR04MZH
The field I want to itemize is ISIN.
The spider runs without errors, but the output is empty.
Here is the line code:
item['ISIN'] = response.xpath('//*[#id="overviewQuickstatsDiv"]/table/tbody/tr[5]/td[3]/text()').extract()
Try to remove tbody from XPath:
'//*[#id="overviewQuickstatsDiv"]/table//tr[5]/td[3]/text()'
Note that this tag is added by your browser while page rendering and it's absent in page source
P.S. I suggest you to use IMHO even better XPath:
'//td[.="ISIN"]/following-sibling::td[contains(#class, "text")]/text()'
I think response.selector was not given. Try this.
response.selector.xpath('//*[#id="overviewQuickstatsDiv"]/table/tbody/tr[5]/td[3]/text()').extract()
I have the following HTML structure
I want to extract all the links with the class:dev-link
<a class="dev-link" href="mailto:info#jourist.com" rel="nofollow" title='Photoshoot"</a>
I am using the below code to extract the link in scrapy
response.css('.dev-link::attr(href)').extract()
I am getting the correct output but is this the right way to use css selectors??
As you can see in Scrapy Documentation there are two methods to scrap data, CSS Selector and XPath Selector both are works correctly but XPath needs some practice to get expert, in my opinion, Xpath is more power in special cases you can scrap data easier that CSS selector ( but of course you can get them with CSS selector too),
what you did is correct
link = response.css('.dev-link::attr(href)').extract_first()
and also you can get it with the following too
link = response.xpath('/[contains(#class,’dev-link’)]/#href').extract_first()
Once I obtain an href through the use of Selenium in Python.
Is there a way to find the XPath based on that href and click on that XPath?
For Example:
href = '/sweatshirts/vct65b9ze/yn2gxohw4'
How would I find the XPath on that page?
When the element is for instance a link, you can use the following code:
driver.find_element_by_xpath('//a[#href="/sweatshirts/vct65b9ze/yn2gxohw4"]');
Here is an example page with pagination controlling dynamically loaded results.
http://www.rehabs.com/local/jacksonville-fl/
All that I presently know to try is:
curButton = 1
driver.find_element_by_css_selector('ul[class="pagination"]').find_elements_by_tag_name('li')[curButton].click()
Nothing seems to happen (also when trying to access and click the a tag or driver.get() the href of the a element).
Is there another way to access the hidden elements? For instance, when reading the html of the entire page, the elements of different pagination are shown, but are apparently inaccessible with BeautifulSoup.
Pagination was added for humans. Maybe you used the wrong xpath or css. Check it.
Use this xpath:
//div[#id="listing-basic"]/article/div[#class="h3"]/a/#href
You can click on the pagination button using:
driver.find_elements_by_css_selector('.pagination li a')[1].click()
I am scraping individual listing pages from justproperty.com (individual listing from the original question no longer active).
I want to get the value of the Ref
this is my xpath:
>>> sel.xpath('normalize-space(.//div[#class="info_div"]/table/tbody/tr/td[norma
lize-space(text())="Ref:"]/following-sibling::td[1]/text())').extract()[0]
This has no results in scrapy, despite working in my browser.
The following works perfectly in lxml.html (with modern Scrapy uses):
sel.xpath('.//div[#class="info_div"]//td[text()="Ref:"]/following-sibling::td[1]/text()')
Note that I'm using // to get between the div and the td, not laying out the explicit path. I'd have to take a closer look at the document to grok why, but the path given in that area was incorrect.
Don't create XPath expression by looking at Firebug or Chrome Dev Tools, they're changing the markup. Remove the /tbody axis step and you'll receive exactly what you're look for.
normalize-space(.//div[#class="info_div"]/table/tr/td[
normalize-space(text())="Ref:"
]/following-sibling::td[1]/text())
Read Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? for more details.
Another XPath that gets the same thing: (.//td[#class='titles']/../td[2])[1]
I tried your XPath using XPath Checker and it works fine.