Here is the code of the web
The xpath of search-results-list container grid is
//[#id="product_type_products_list"]/div/div[2]/div
and the xpath of result is
//*[#id="product_type_products_list"]/div/div[2]/div/div[1]
I have try using :
elems = driver.find_elements_by_xpath('//*[#id="product_type_products_list"]/div/div[2]/div')
url = driver.find_element_by_link_text(elems[0].text).get_attribute("href")
print(url)
this give the link to the beginning of the web.
Thank you for your consideration.
The code you've provided doesn't look like a valid HTML to me, however you can try the following XPath expression:
//div[#class='result']/descendant::a
More information:
XPath Tutorial
XPath Axes
XPath Functions and Operators
Try Narrowing it down to the <'A> Tag by appending the xpath like so:
elems = driver.find_elements_by_xpath('.//*[#id="product_type_products_list"]/div/div[2]/div/div[1]/a')
Then just retrieve the href attribute like you did earlier but using the same element:
url = elems[0].get_attribute("href")
Related
I have this in my code:
link_tag = "//div[#class= 'yuRUbf']//a/#href"
With this code I get this error
The result of the xpath expression "//div[#class= 'yuRUbf']//a/#href" is: [object Attr]. It should be an element.
I don't know any other way to scrape the URL from that div class. How can I fix this?
This code works fie on the scraper chrome extension but not on python.
You have to change your XPath to select an element node (as the error message suggests) - and not an attribute node - and, after that, get its attribute. So use
link_tag = "//div[#class= 'yuRUbf']//a"
links = driver.find_elements_by_xpath(link_tag)
and then extract the attribute with
links[0].get_attribute("href")
to get the #href attribute of the first matching element.
This solution might work.
div = driver.find_element(By.CSS_SELECTOR,"div[class='yuRUbf']")
url = div.get_attribute("href")
I am trying to get the href with selenium and python.
This is my page:
Some class information are changing depending on which elements. So I am trying basically to get all href for <a id="job____ .....
links.append(job.find_element_by_xpath('//a[#aria-live="polite"]//span').get_attribute(name="href"))
I tried couple of things but can't figure out how. How can i get all my href from the screenshot above?
Try this, but take care your xpath
"//a[#aria-live="polite"]//span"
will get a span, and i dont see any span with href on your html. Maybe this xpath solve it
//a[./span[#aria-live="polite"]]
links.append(job.find_element_by_xpath('//a[./span[#aria-live="polite"]]').get_attribute("href"))
But it wont get all urls, this with find_elements (return a list), extend your url list with list comprehension
links.extend([x.get_attribute("href") for x in job.find_elements_by_xpath('//a[./span[#aria-live="polite"]]')])
edit 1, other xpath solution
links.extend(["website_base_url"+x.get_attribute("href") for x in job.find_elements_by_xpath('//a[contains(#id, "job_")]')])
list_of_elements_with_href = wd.find_elements_by_xpath("//a[contains(#href,'')]")
for el_with_href in list_of_elements_with_href :
links.append(el.with_href.get_attribute("href"))
or if you need more specify:
list_of_elements_with_href = wd.find_elements_by_xpath("//a[contains(#href,'') and contains(#id,'job_')]")
Based on your description and attached image, I think you have got the wrong xpath. Try the following code.
find_links = driver.find_elements_by_xpath("//a[starts-with(#id,'job_')]")
links = []
for link in find_links:
links.append(link.get_attribute("href"))
Please note elements in find_elements_by_xpath instead of element.
I am unable to test this solution as you have not provided the website.
I was attempting to solve this issue for a bit of time and attempted multiple solution posted on here prior to opening this question.
I am currently attempting to a run a scraper with the following code
website = 'https://www.abitareco.it/nuove-costruzioni-milano.html'
path = Path().joinpath('util', 'chromedriver')
driver = webdriver.Chrome(path)
driver.get(website)
main = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "p1")))
My goal hyperlink has word scheda in it:
i = driver.find_element_by_xpath('.//a[contains(#href, "scheda")]')
i.text
My first issue is that find_element_by_xpath only outputs a single hyperlink and second issue is that it is not extracting anything so far.
I'd appreciate any help and/or guidance.
You need to use find_elements instead :
for name in driver.find_elements(By.XPATH, ".//a[contains(#href, 'scheda')]"):
print(name.text)
Note that find_elements will return a list of web elements, where as find_element return a single web element.
if you specifically looking for href attribute then you can try the below code :
for name in driver.find_elements(By.XPATH, ".//a[contains(#href, 'scheda')]"):
print(name.get_attribute('href'))
There's 2 issues, looking at the website.
You want to find all elements, not just one, so you need to use find_elements, not find_element
The anchors actually don't have any text in them, so .text won't return anything.
Assuming what you want is to scrape the URLs of all these links, you can use .get_attribute('href') instead of .text, like so:
url_list = driver.find_elements(By.XPATH, './/a[contains(#href, "scheda")]')
for i in url_list:
print(i.get_attribute('href'))
It will detect all webelements that match you criteria and store them in a list. I just used print as an example, but obviously you may want to do more than just print the links.
I try to get links whose title contains some word in the mean time not contains some words, I use the following code but it says is not a valid XPath expression.
Please find my code here:
Any help will be highly appreciated!
driver.get("http://www.csisc.cn/zbscbzw/isinbm/index_list_code.shtml")
while True:
links = [link.get_attribute('href') for link in driver.find_elements_by_xpath("//a[(contains(#title,'公司债券')and not(contains(#title,'短期'))]")]
for link in links:
driver.get(link)
#dosth
There is an extra bracket in you xpath, use
links = [link.get_attribute('href') for link in driver.find_elements_by_xpath("//a[contains(#title,'公司债券')and not(contains(#title,'短期'))]")]
instead
You can use chrome developer tools first to validate your xpaths
PS: I changed the xpath here a bit to be able to find some elements in my page
There should be space before and. Also there is extra leading bracket in your XPath. Try:
"//a[contains(#title,'公司债券') and not(contains(#title,'短期'))]"
I am trying to write a program that writes code for me. Imagine i have a UL list on a website and i need to scrape all the xpath selectors for each elements in the list. Is there an easy way to tell python to grab the xpath selectors for all of the elements in the UL?
For example we have this UL
<ul id="test">
<li>Zurich</li>
<li>Geneva</li>
<li>Winterthur</li>
<li>Lausanne</li>
<li>Lucerne</li>
</ul>
And i have this code in python
ul= driver.find_elements_by_id('test')
for element in ul:
selector = **find the xpath/selector**
text = element.text
How can i scrape the xpath for each link in the UL?
Thank you!
Edit** This is the best solution i have found but it is using several other modules. Is there any way to do this with only selenium?
lxml can auto-generate an absolute xpath for you using getpath() method.
Example (using wikipedia main page, getting xpath expression for the logo):
import urllib2
from lxml import etree
data = urllib2.urlopen("https://en.wikipedia.org")
tree = etree.parse(data)
element = tree.xpath('//div[#id="p-logo"]/a')[0]
print tree.getpath(element)
Try with the below xpath:
li= driver.find_elements_by_xpath('//ul[#id="test"]/li')
for element in li:
text = element.text