I am using Python Selenium to look through some HTML and find elements. I have the following HTML saved into Python...
<section id="categories">
<ul id="category_list">
<li id="category84">
Sample Category
</li>
<li id="category984">
Another Category
</li>
<li id="category22">
My Sample Category
</li>
</ul>
</section>
I can find the categories section easy enough but now I would like to loop through each list item and save it's name and href link into an array.
Anyone got a similar example I can see?
Sure, let's use a CSS selector locator and a list comprehension calling .get_attribute("href") to get the link and .text to get the link text:
categories = driver.find_elements_by_css_selector("#categories #category_list li[id^=category] a")
result = [{"link": category.get_attribute("href"), "text": category.text}
for category in categories]
print(result)
Related
I'd like to click on the following link but it's not working:
<ul class="pagination pagination-large">
<li style="display:inline;">
<a name="Next" href="jamm/flavours/page=2" class="next">
<span class="icon-navigate_next"></span>
</a>
</li>
</ul>
My code
items = browser.find_elements_by_xpath("//ul[#class = 'pagination pagination-large']//li[#style ='display:inline;']")
print items
for k in items:
print k
k.click()
print("clicked")
k.send_keys(webdriver.common.keys.Keys.SPACE)
I think the problem is the xpath you are using is not finding the element you need, in fact, it targets the list element, not the anchor.
Maybe you can try to identify the link by using the css class next instead:
items = browser.find_elements_by_class_name('next')
for item in items:
item.click()
If it works now, you can either just use that if you don't have any other elements using it or you can fix your xpath.
Try:
browser.find_elements_by_name('Next')
guys,
I have a question, scrapy, selector, XPath
I would like to choose the link in the "a" tag in the last "li" tag in HTML, and how to write the query for XPath
I did that, but I believe there are simpler ways to do that, such as using XPath queries, not using list fragmentation, but I don't know how to write
from scrapy import Selector
sel = Selector(text=html)
print sel.xpath('(//ul/li)').xpath('a/#href').extract()[-1]
'''
html
'''
</ul>
<li>
<a href="/info/page/" rel="follow">
<span class="page-numbers">
35
</span>
</a>
</li>
<li>
<a href="/info/page/" rel="follow">
<span class="next">
next page.
</span>
</a>
</li>
</ul>
I am assuming you want specifically the link to the "next" page. If this is the case, you can locate an a element checking the child span to the "next" class:
//a[span/#class = "next"]/#href
I'm using Scrapy to get some data from a website.
I have the following list of links:
<li class="m-pagination__item">
10
</li>
<li class="m-pagination__item">
<a href="?isin=IT0000072618&lang=it&page=1">
<span class="m-icon -pagination-right"></span>
</a>
I want to extract the href attribute only of the 'a' element that contains the span class="m-icon -pagination-right".
I've been looking for some examples of xpath but I'm not an expert of xpath and I couldn't find a solution.
Thanks.
//a[span/#class = 'm-icon -pagination-right']/#href
With a Scrapy response:
response.css('span.m-icon').xpath('../#href')
I have an XML like the following
<li class="expandSubItem">
<span class="expandSubLink">Popular Neighborhoods</span>
<ul class="secondSubNav" style="top:-0.125em;">
<li class="subItem">
<a class="subLink" href="/Hotels-g187147-zfn7236765-Paris_Ile_de_France-Hotels.html">Quartier Latin Hotels</a>
</li>
</ul>
</li>
<li class="expandSubItem">
<span class="expandSubLink">Popular Paris Categories</span>
<ul class="secondSubNav" style="top:-0.125em;">
<li class="subItem">
<a class="subLink" href="/HotelsList-Paris-Cheap-Hotels-zfp10420.html">Paris Cheap Hotels</a>
</li>
</ul>
</li>
I want to get all links under "Popular Paris Categories". I used something like this //li//a/#href/following::span[text()='Popular Singapore Categories'], but it gave no results. Any idea how to get the correct result? Here is the snippet of the python code that I wrote.
t_url = 'https://www.tripadvisor.com/Tourism-g187147-Paris_Ile_de_France-Vacations.html'
page = requests.get(t_url, timeout=30)
tree = html.fromstring(page.content)
links = tree.xpath('//li[span="Popular Paris Categories"]//a/#href')
print links
This is one possible way :
//li[normalize-space(span)="Popular Paris Categories"]//a/#href
Notice how normalize-space() is used to remove trailing space from the span content. This is the reason why the XPath I suggested initially in the comment didn't work for your actual HTML.
Something like this perhaps
//span[text()='Popular Paris Categories']/following-sibling::ul//a/#href
I have a html file which looks something similar to this :
<html>
...
<li class="not a user"> </li>
<li class="user">
<a href="abs" ...> </a>
</li>
<li class="user">
<a href="bss" ...> </a>
</li>
...
</html>
given the above input I want to parse the li tags with class="user" and get the value of the href's as output. is this possible using beautifulsoup in python ???
my solution was:
data="the above html code snippet"
soup=BeautifulSoup(data)
listset=soup("li","user")
for list in listset:
attrib_value=[a['href'] for a in list.findAll('a',{'href':True})]
obviously i have an error somewhere that it only lists the attribute value for the last anchor tag's href.
Your code is fine. There are three elements in listset - and attrib_value gets overridden in each iteration of your loop, so at the end of the program, it only contains the href values from the last element of listset, which is bss.
Try this instead to keep all values:
attrib_value += [a['href'] for a in list.findAll('a',{'href':True})]
and initialize attrib_value to the empty list before the loop (attrib_value = []).