I've been researching this for two days now. There seems to be no simple way of doing this. I can find an element on a page by downloading the html with Selenium and passing it to BeautifulSoup, followed by a search via classes and strings. I want to click on this element after finding it, so I want to pass its Xpath to Selenium. I have no minimal working example, only pseudo code for what I'm hoping to do.
Why is there no function/library that lets me search through the html of a webpage, find an element, and then request it's Xpath? I can do this manually by inspecting the webpage and clicking 'copy Xpath'. I can't find any solutions to this on stackoverflow, so please don't tell me I haven't looked hard enough.
Pseudo-Code:
*parser is BeautifulSoup HTML object*
for box in parser.find_all('span', class_="icon-type-2"): # find all elements with particular icon
xpath = box.get_xpath()
I'm willing to change my code entirely, as long as I can locate a particular element, and extract it's Xpath. So any other ideas on entirely different libraries are welcome.
Related
Hi :) This is my first time here and I am new to programming.
I am currently trying to automate some work-steps, using Selenium.
There was no problem, mainly using the find_element(By.ID,'') function and clicking stuff.
But now I cannot find any element that comes after the second "html" tag on the site (see screenshot)
I tried to google this "multiple html" problem, but all I found was people saying it is not possible to have multiple html tags. I basically don't know anything about html, but this site seems to have more than one - there are actually three. And anything after the first one cannot be subject to the find_element function. Please help me with this confusion.
These "multiple html" are due to the i frames in the html code. Each iframe has its own html code. If the selector you are using is meant to find something inside one of these iframes you have to "move" your driver inside the iframe. You can find an example in this other question
I'm relatively new to using python & selenium. I'm trying to access NexisUni to automate a loop of searches. But, once I'm in NexisUni, I struggle to locate elements -- I get a "no such element" exception. I want to locate the search bar and input my search terms.
I've read about the fact that an iFrame might be present, and I need to switch frames. But, I don't see any frames! Is there a way to identify frames easily -- and could a frame be present without the word "frame" in the HTML? I've also tried loading the page longer and having the driver wait, to no avail.
The HTML code is below, the grey part is the piece I'd like to select:
HTML Code
The code I'm writing to identify it is:
SearchBar = driver.find_element_by_xpath('/html/body/main/div/div[13]/div[2]/div[1]/header/div[3]/section/span[2]/span/textarea').send_keys('search text')
I've also tried these two options:
find_element_by_class_name, find_element_by_id
WebDriverWait(driver,10).until(EC.presence_of_element_located)
... Any suggestions would be appreciated!
for chrome, I install ChroPath to find elements on the page.
I want to find XPath for like elements on Instagram Page, but seems that not work :
//span[contains(#class,'glyphsSpriteHeart__outline__24__grey_9 u-__7')]
also, I try it :
/html[1]/body[1]/div[3]/div[1]/div[2]/div[1]/article[1]/div[2]/section[1]/span[1]/button[1]/span[1]
when selenium click :
elenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div._2dDPU.vCf6V div.zZYga div.PdwC2._6oveC article.M9sTE.L_LMM.JyscU div.eo2As section.ltpMr.Slqrh span.fr66n button.coreSpriteHeartOpen.oF4XW.dCJp8 > span.glyphsSpriteHeart__outline__24__grey_9.u-__7"}
how can I find XPath? any good extension or something?
how can I find XPath? any good extension or something?
You cannot "find" the Xpath of an element. There are many, many XPath's that will find any element. Some will be stable, others will be unstable. The decision on which Xpath to use is based upon your understanding and experience of Selenium, and you understanding of how the Application Under Test is written and behaves.
If you are looking for a tool to experiment with different XPaths, then Chrome's built-in Developer Tools Console allows you to test both Xpath & CSS Selectors;
In your specific scenario about finding elements by class name, then CSS Selector is a much better choice than XPath as CSS selectors will treat multiple classes as an array where as XPath sees "class" as a literal string, hence why you needed to use "contains".
This might help:
https://selectorgadget.com/
This as well, to understand what you are manipulating:
https://www.w3schools.com/xml/xpath_syntax.asp
As for your example where you go down the tree using index numbers (ie: /html[1]/body[1]), A slight change in the site will make your script to fail. Find a way to build something more robust! Also have a look at CSS selectors if you object's appearance is known in advance.
To get all like buttons on instagram use css selector below:
span[aria-label="Like"]
You can get some helpful details here: https://www.w3schools.com/cssref/css_selectors.asp
I am trying to create "universal" Xpath, so when I run spider, it will be able to download the hotel name for each hotel on the list.
This is the XPath that I need to convert:
//*[#id="offerPage"]/div[3]/div[1]/div[1]/div/div/div/div/div[2]/div/div[1]/h3/a
Can anyone point me the right direction?
This is the example how they did it in the scrapy docs:
https://github.com/scrapy/quotesbot/blob/master/quotesbot/spiders/toscrape-xpath.py
for text: they have :
'text': quote.xpath('./span[#class="text"]/text()').extract_first(),
When you open "http://quotes.toscrape.com/" and copy Xpath for text you will get :
/html/body/div/div[2]/div[1]/div[1]/span[1]
When you look at the html that your are scraping just using "copy xpath" from the browser source viewer is not enough.
You need to look at the attributes that the html tags have.
Of course, using just tag types as an xpath can work, but what if not every page you are going to scrape follows that pattern?
The Scrapy example you are using uses the span's class attribute to precisely point to the target tag.
I suggest reading a bit more about Xpath (for example here) to understand how flexible your search patterns can be.
If you want to go even broader, reading about DOM structure will also be useful. Let us know if you need more pointers.
I am scraping individual listing pages from justproperty.com (individual listing from the original question no longer active).
I want to get the value of the Ref
this is my xpath:
>>> sel.xpath('normalize-space(.//div[#class="info_div"]/table/tbody/tr/td[norma
lize-space(text())="Ref:"]/following-sibling::td[1]/text())').extract()[0]
This has no results in scrapy, despite working in my browser.
The following works perfectly in lxml.html (with modern Scrapy uses):
sel.xpath('.//div[#class="info_div"]//td[text()="Ref:"]/following-sibling::td[1]/text()')
Note that I'm using // to get between the div and the td, not laying out the explicit path. I'd have to take a closer look at the document to grok why, but the path given in that area was incorrect.
Don't create XPath expression by looking at Firebug or Chrome Dev Tools, they're changing the markup. Remove the /tbody axis step and you'll receive exactly what you're look for.
normalize-space(.//div[#class="info_div"]/table/tr/td[
normalize-space(text())="Ref:"
]/following-sibling::td[1]/text())
Read Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? for more details.
Another XPath that gets the same thing: (.//td[#class='titles']/../td[2])[1]
I tried your XPath using XPath Checker and it works fine.