I want to scrape 2015 from below HTML:
I use the below code but am only able to scrape "Annee"
soup.find('span', {'class':'optionLabel'}).get_text()
Can someone please help?
I am a new learner.
Simply try to find its next span that holds the text you wanna scrape:
soup.find('span', {'class':'optionLabel'}).find_next('span').get_text()
or css selectors with adjacent sibling combinator:
soup.select_one('span.optionLabel + span').get_text()
Example
html='''
<span class="optionLabel"><button>Année</button</span> :
<span>2015</span>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
soup.find('span', {'class':'optionLabel'}).find_next('span').get_text()
Output
2015
Related
I am a beginner in Python3, I am working on selenium project for a website
the text that i want is under the path ("//div[#class='classname']//span[#class='classname2']).text
but i cannot extract it without a beautifulsoup
for i in postsContainer.extract():
soup = bs(i)
people.append([soup.find("div",{"class":"classname"}).text])
but It doesn't work without the //span part. How can I insert my path in a beautifulsoup?
If someone can help
If there would be some more html to inspect, we would maybe find a better solution, but you can use the css selectors in this case
soup.select_one('div.css-901oao.r-18jsvk2.r-1qd0xha.r-a023e6.r-16dba41.r-ad9z0x.r-bcqeeo.r-bnwqim.r-qvutc0 > span.css-901oao.css-16my406.r-poiln3.r-bcqeeo r-qvutc0').get_text()
or:
soup.select_one('div.css-901oao.r-18jsvk2.r-1qd0xha.r-a023e6.r-16dba41.r-ad9z0x.r-bcqeeo.r-bnwqim.r-qvutc0 > span').get_text()
Example
from bs4 import BeautifulSoup
html='''
<div class="classname">
<span class="classname2">text</span>
</div>
'''
soup = BeautifulSoup(html,'html.parser')
soup.select_one('div.classname span.classname2').get_text()
I have the following html piece
soup = <span class="posting-location go-to-posting">
Santa Gertrudes ,
<span> Tatuapé, São Paulo</span>
</span>
I know that, to access the "Tatuapé, São Paulo", I can use
soup.select_one('span')
However, how do I select "Santa Gertrudes , "?
I'm using BeautifulSoup to parse the HTML you provided.
Then I navigate the soup using the spans. After I have the target element, I get the text of the element.
soup.span.span.text
or
This finds all spans and selects the second one.
soup.find_all('span')[1]
I have this additional code before calling either of those.
from bs4 import BeautifulSoup
html = "<span class="posting-location go-to-posting">Santa Gertrudes , <span> Tatuapé, São Paulo</span></span>"
soup = BeautifulSoup(html, 'html.parser')
I'm working on a personal project where I scrape data from a website. I'm trying to use beautiful soup to do this but I came across data in the same class but a different attribute. For example:
<div class="pi--secondary-price">
<span class="pi--price">$11.99 /<abbr title="Kilogram">kg</abbr></span>
<span class="pi--price">$5.44 /<abbr title="Pound">lb.</abbr></span>
</div>
How do I just get $11.99/kg? Right now I'm getting
$11.99 /kg
$5.44 /lb.
I've done x.select('.pi--secondary-price') but it returns both prices. How do I only get 1 price ($11.99 /kg)?
You could first get the <abbr> tag and then search for the respective parent tag. Like this:
from bs4 import BeautifulSoup
html = '''
<div class="pi--secondary-price">
<span class="pi--price">$11.99 /<abbr title="Kilogram">kg</abbr></span>
<span class="pi--price">$5.44 /<abbr title="Pound">lb.</abbr></span>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
kg = soup.find(title="Kilogram")
print(kg.parent.text)
This gives you the desired output $11.99 /kg. For more information, see the BeautifulSoup documentation.
I have to scrape a web page using BeautifulSoup in python.So to extract the complete div which hass the relavent information and looks like the one below:
<div data-v-24a74549="" class="row row-mg-mod term-row">
I wrote soup.find('div',{'class':'row row-mg-mod term-row'}).
But it is returning nothing.I guess it is something to do with this data-v value.
Can someone tell the exact syntaxof scraping this type of data?
Give this a try:
from bs4 import BeautifulSoup
content = """
<div data-v-24a74549="" class="row row-mg-mod term-row">"""
soup = BeautifulSoup(content,'html.parser')
for div in soup.find_all("div", {"class" : "row"}):
print(div)
I want to scrape the text ("Showing 650 results") from a website.
The result of I am looking for is:
Result : Showing 650 results
The following is the Html code:
<div class="jobs-search-results__count-sort pt3">
<div class="jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4">
Showing 650 results
</div>
Python code:
response = requests.get(index_url)
soup = BeautifulSoup(response.text, 'html.parser')
text = {}
link = "jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4"
for div in soup.find_all('div',attrs={"class" : link}):
text[div.text]
text
So far it looks like my code is not working.
You don't need soup.find_all if you're looking for one element only, soup.find works just as well
You can use tag.string/tag.contents/tag.text to access inner text
div = soup.find('div', {"class" : link})
text = div.string
Old: from BeautifulSoup import BeautifulSoup
"Development on the 3.x series of Beautiful Soup ended in 2011, and the series will be discontinued on January 1, 2021, one year after the Python 2 sunsetting date."
New: from bs4 import BeautifulSoup
"Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree."