I'm having value like
<a href="/for-sale/property/abu-dhabi/page-3/" title="Next" class="b7880daf"><div title="Next" class="ea747e34 ">
I need to pull out only ""Next" from title="Next" for the one i used
soup.find('a',attrs={"title": "Next"}).get('title')
is there any method to get the tittle value with out using .get("title")
My code
next_page_text = soup.find('a',attrs={"title": "Next"}).get('title')
Output:
Next
I need:
next_page_text = soup.find('a',attrs={"title": "Next"})
Output:
Next
Please let me know if there is any method to find.
You should get Next.Try this. Using find() or select_one() and Use If to check if element is present on a page.
from bs4 import BeautifulSoup
import requests
res=requests.get("https://www.bayut.com/for-sale/property/abu-dhabi/page-182/")
soup=BeautifulSoup(res.text,"html.parser")
if soup.find("a", attrs={"title": "Next"}):
print(soup.find("a", attrs={"title": "Next"})['title'])
If you want to use css selector.
if soup.select_one("a[title='Next']"):
print(soup.select_one("a[title='Next']")['title'])
I'm re-writing my answer as there was confusion in your original post.
If you'd like to take the URL associated with the Next tag:
soup.find('a', title='Next')['href']
['href'] can be replaced with any other attribute in the element, so title, itemprop etc.
If you'd like to select the element with Next in the title:
soup.find('a', title='Next')
Related
I would like to select the second element by specifying the fact that it contains "title" element in it (I don't want to just select the second element in the list)
sample = """<h5 class="card__coins">
<a class="link-detail" href="/en/coin/smartcash">SmartCash (SMART)</a>
</h5>
<a class="link-detail" href="/en/event/smartrewards-812" title="SmartRewards">"""
How could I do it?
My code (does not work):
from bs4 import BeautifulSoup
soup = BeautifulSoup(sample.content, "html.parser")
second = soup.find("a", {"title"})
for I in soup.find_all('a', title=True):
print(I)
After looping through all a tags we are checking if it contains title attribute and it will only print it if it contains title attribute.
Another way to do this is by using CSS selector
soup.select_one('a[title]')
this selects the first a element having the title attribute.
I am trying to select an HTML element with an id that is a number, ie. <div id=27047243>
When I try to use the select method like this soup.select("#27047243") I get an error that says Malformed id selector
I figured I need to escape the number somehow, I tried like this soup.select(r"#\3{number}" but even though I did not get the error anymore, I could not get the element
I know I could use the find method soup.find(id="27047243") and that works, but the problem is I need to go deeper into nested elements so I want to know if there is a way how to do this using 'select' so I can use CSS selectors
You can use div[id="27047243"]:
from bs4 import BeautifulSoup
html_doc = """
<div id=27047243>
I want this.
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
print(soup.select_one('div[id="27047243"]'))
Prints:
<div id="27047243">
I want this.
</div>
You can try this:
soup.select_one('div#27047243')
I have such HTML code:
<li class="IDENTIFIER"><h5 class="hidden">IDENTIFIER</h5><p>
<span class="tooltip-iws" data-toggle="popover" data-content="SOME TEXT">
other text</span></p></li>
And I'd like to obtain the SOME TEXT from the data-content.
I wrote
target = soup.find('span', {'class' : 'tooltip-iws'})['data-content']
to get the span, and I wrote
identifier_elt= soup.find("li", {'class': 'IDENTIFIER'})
to get the class, but I'm not sure how to combine the two.
But the class tooltip-iws is not unique, and I would get extraneous results if I just used that (there are other spans, before the code snippet, with the same class)
That's why I want to specify my search within the class IDENTIFIER. How can I do that in BeautifulSoup?
try using css selector,
soup.select_one("li[class='IDENTIFIER'] > p > span")['data-content']
Try using selectorlib, should solve your issue, comment if you need further assistance
https://selectorlib.com/
>>> soup_brand
<a data-role="BRAND" href="/URL/somename">
Some Name
</a>
>>> type(soup_brand)
<class 'bs4.BeautifulSoup'>
>>> print(soup_brand.get('href'))
None
Documentation followed: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Hi people from all over the world,
does someone now whats going wrong or am I targeting the object wrong ?
Need to get the href.
Have you tried:
soup.find_all(name="a")
or
soup.select_one(selector="a")
it should also be possible to catch with
all_anchor_tags = soup.find_all(name="a")
for tag in all_anchor_tags:
print(tag.get("href")) #prints the href element of each a tag, thus each link
Although the all bs4 looks for multiple elemnts (the reason why we have a loop here) I encountered, that bs4 sometime is better in catching things, if you give it a search for all approach and then iterate over the elements
in order to apply ['href'] the object must be <bs4.Element.Tag>.
so, try this:
string = \
"""
<a data-role="BRAND" href="/URL/somename">
Some Name
</a>
"""
s = BeautifulSoup(string)
a_tag = s.find('a')
print(a_tag["href"])
out
/URL/somename
or if you have multiple a tags you can try this:
a_tags = s.findAll('a')
for a in a_tags:
print(a.get("href"))
out
/URL/somename
Suppose we have the html code as follows:
html = '<div class="dt name">abc</div><div class="name">xyz</div>'
soup = BeautifulSoup(html, 'lxml')
I want to get the name xyz. Then, I write
soup.find('div',{'class':'name'})
However, it returns abc.
How to solve this problem?
The thing is that Beautiful Soup returns the first element that has the class name and div so the thing is that the first div has class name and class dt so it selects that div.
So, div helps but it still narrows down to 2 divs. Next, it returns a array so check the second div to use print(soup('div')[1].text). If you want to print all the divs use this code:
for i in range(len(soup('div')))
print(soup('div')[i].text)
And as pointed out in Ankur Sinha's answer, if you want to select all the divs that have only class name, then you have to use select, like this:
soup.select('div[class=name]')[0].get_text()
But if there are multiple divs that satisfy this property, use this:
for i in range(len(soup.select('div[class=name]'))):
print(soup.select('div[class=name]')[i].get_text())
Just to continue Ankur Sinha, when you use select or even just soup() it forms a array, because there can be multiple items so that's why I used len(), to figure out the length of the array. Then I ran a for loop on it and then printed the select function at i starting from 0.
When you do that, it rather would give a specific div instead of a array, and if it gave out a array, calling get_text() would produce errors because the array is NOT text.
This blog was helpful in doing what you would like, and that is to explicitly find a tag with specific class attribute:
from bs4 import BeautifulSoup
html = '<div class="dt name">abc</div><div class="name">xyz</div>'
soup = BeautifulSoup(html, 'html.parser')
soup.find(lambda tag: tag.name == 'div' and tag['class'] == ['name'])
Output:
<div class="name">xyz</div>
You can do it without lambda also using select to find exact class name like this:
soup.select("div[class = name]")
Will give:
[<div class="name">xyz</div>]
And if you want the value between tags:
soup.select("div[class=name]")[0].get_text()
Will give:
xyz
In case you have multiple div with class = 'name', then you can do:
for i in range(len(soup.select("div[class=name]"))):
print(soup.select("div[class=name]")[i].get_text())
Reference:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
This might work for you, note that it is contingent on the div being the second div item in the html.
import requests
from bs4 import BeautifulSoup
html = '<div class="dt name">abc</div><div class="name">xyz</div>'
soup = BeautifulSoup(html, features='lxml')
print(soup('div')[1].text)