Scraping <span> text</span> with BeautifulSoup and urllib - python

I want to scrape 2015 from below HTML:
I use the below code but am only able to scrape "Annee"
soup.find('span', {'class':'optionLabel'}).get_text()
Can someone please help?
I am a new learner.

Simply try to find its next span that holds the text you wanna scrape:
soup.find('span', {'class':'optionLabel'}).find_next('span').get_text()
or css selectors with adjacent sibling combinator:
soup.select_one('span.optionLabel + span').get_text()
Example
html='''
<span class="optionLabel"><button>Année</button</span> :
<span>2015</span>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
soup.find('span', {'class':'optionLabel'}).find_next('span').get_text()
Output
2015

Related

beautiful soup find text of path that contains div and span

I am a beginner in Python3, I am working on selenium project for a website
the text that i want is under the path ("//div[#class='classname']//span[#class='classname2']).text
but i cannot extract it without a beautifulsoup
for i in postsContainer.extract():
soup = bs(i)
people.append([soup.find("div",{"class":"classname"}).text])
but It doesn't work without the //span part. How can I insert my path in a beautifulsoup?
If someone can help
If there would be some more html to inspect, we would maybe find a better solution, but you can use the css selectors in this case
soup.select_one('div.css-901oao.r-18jsvk2.r-1qd0xha.r-a023e6.r-16dba41.r-ad9z0x.r-bcqeeo.r-bnwqim.r-qvutc0 > span.css-901oao.css-16my406.r-poiln3.r-bcqeeo r-qvutc0').get_text()
or:
soup.select_one('div.css-901oao.r-18jsvk2.r-1qd0xha.r-a023e6.r-16dba41.r-ad9z0x.r-bcqeeo.r-bnwqim.r-qvutc0 > span').get_text()
Example
from bs4 import BeautifulSoup
html='''
<div class="classname">
<span class="classname2">text</span>
</div>
'''
soup = BeautifulSoup(html,'html.parser')
soup.select_one('div.classname span.classname2').get_text()

Extract text from a class in HTML using CSS language

I have the following html piece
soup = <span class="posting-location go-to-posting">
Santa Gertrudes ,
<span> Tatuapé, São Paulo</span>
</span>
I know that, to access the "Tatuapé, São Paulo", I can use
soup.select_one('span')
However, how do I select "Santa Gertrudes , "?
I'm using BeautifulSoup to parse the HTML you provided.
Then I navigate the soup using the spans. After I have the target element, I get the text of the element.
soup.span.span.text
or
This finds all spans and selects the second one.
soup.find_all('span')[1]
I have this additional code before calling either of those.
from bs4 import BeautifulSoup
html = "<span class="posting-location go-to-posting">Santa Gertrudes , <span> Tatuapé, São Paulo</span></span>"
soup = BeautifulSoup(html, 'html.parser')

How to extract data(text) using beautiful soup when they are in the same class?

I'm working on a personal project where I scrape data from a website. I'm trying to use beautiful soup to do this but I came across data in the same class but a different attribute. For example:
<div class="pi--secondary-price">
<span class="pi--price">$11.99 /<abbr title="Kilogram">kg</abbr></span>
<span class="pi--price">$5.44 /<abbr title="Pound">lb.</abbr></span>
</div>
How do I just get $11.99/kg? Right now I'm getting
$11.99 /kg
$5.44 /lb.
I've done x.select('.pi--secondary-price') but it returns both prices. How do I only get 1 price ($11.99 /kg)?
You could first get the <abbr> tag and then search for the respective parent tag. Like this:
from bs4 import BeautifulSoup
html = '''
<div class="pi--secondary-price">
<span class="pi--price">$11.99 /<abbr title="Kilogram">kg</abbr></span>
<span class="pi--price">$5.44 /<abbr title="Pound">lb.</abbr></span>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
kg = soup.find(title="Kilogram")
print(kg.parent.text)
This gives you the desired output $11.99 /kg. For more information, see the BeautifulSoup documentation.

Scraping div with a data- attribute using Python and BeautifulSoup

I have to scrape a web page using BeautifulSoup in python.So to extract the complete div which hass the relavent information and looks like the one below:
<div data-v-24a74549="" class="row row-mg-mod term-row">
I wrote soup.find('div',{'class':'row row-mg-mod term-row'}).
But it is returning nothing.I guess it is something to do with this data-v value.
Can someone tell the exact syntaxof scraping this type of data?
Give this a try:
from bs4 import BeautifulSoup
content = """
<div data-v-24a74549="" class="row row-mg-mod term-row">"""
soup = BeautifulSoup(content,'html.parser')
for div in soup.find_all("div", {"class" : "row"}):
print(div)

Web scraping - Get text from a class with BeautifulSoup and Python?

I want to scrape the text ("Showing 650 results") from a website.
The result of I am looking for is:
Result : Showing 650 results
The following is the Html code:
<div class="jobs-search-results__count-sort pt3">
<div class="jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4">
Showing 650 results
</div>
Python code:
response = requests.get(index_url)
soup = BeautifulSoup(response.text, 'html.parser')
text = {}
link = "jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4"
for div in soup.find_all('div',attrs={"class" : link}):
text[div.text]
text
So far it looks like my code is not working.
You don't need soup.find_all if you're looking for one element only, soup.find works just as well
You can use tag.string/tag.contents/tag.text to access inner text
div = soup.find('div', {"class" : link})
text = div.string
Old: from BeautifulSoup import BeautifulSoup
"Development on the 3.x series of Beautiful Soup ended in 2011, and the series will be discontinued on January 1, 2021, one year after the Python 2 sunsetting date."
New: from bs4 import BeautifulSoup
"Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree."

Categories