How does Beautiful Soup extract class attribute values? - python

I use beautifulsoup to extract multiple attribute values of class, but ['fa', 'fa-address-book-o'] is not the result I want.
from bs4 import BeautifulSoup
html = "<i class='fa fa-address-book-o' aria-hidden='true'></i>"
soup = BeautifulSoup(html, "lxml")
h2 ="i")
I want the effect to be as follows:
fa fa-address-book-o

join all the elements in your list, and put a space between them
from bs4 import BeautifulSoup
html = "<i class='fa fa-address-book-o' aria-hidden='true'></i>"
soup = BeautifulSoup(html, "lxml")
h2 ="i")
print(' '.join(h2[0]['class']))


Get html text with Beautiful Soup

I'm trying to get the number from inside a div:
<div class="tv-symbol-price-quote__value js-symbol-last">122.7<span class="">8</span></div>
I need the 122.7 number, but I cant get it. I have tried with:
strings = soup.find("div", class_="tv-symbol-price-quote__value js-symbol-last").string
But, there are more than one element and I receive "none".
Is there a way to print the childs and get the string from childs?
Use .getText().
For example:
from bs4 import BeautifulSoup
sample_html = """
<div class="tv-symbol-price-quote__value js-symbol-last">122.7<span class="">8</span></div>
soup = BeautifulSoup(sample_html, "html.parser")
strings = soup.find("div", class_="tv-symbol-price-quote__value js-symbol-last").getText()
Or use __next__() to get only the 122.7.
soup = BeautifulSoup(sample_html, "html.parser")
strings = soup.find("div", class_="tv-symbol-price-quote__value js-symbol-last").strings.__next__()
To only get the first text, search for the tag, and call the next_element method.
from bs4 import BeautifulSoup
html = """
<div class="tv-symbol-price-quote__value js-symbol-last">122.7<span class="">8</span></div>
soup = BeautifulSoup(html, "html.parser")
soup.find("div", class_="tv-symbol-price-quote__value js-symbol-last").next_element
You could use selenium to find the element and then use BS4 to parse it.
An example would be
import selenium.webdriver as WD
from import Options
import bs4 as B
driver = WD.Chrome()
objXpath = driver.find_element_by_xpath("""yourelementxpath""")
objHtml = objXpath.get_attribute("outerHTML")
soup = B.BeutifulSoup(objHtml, 'html.parser')
text = soup.get_text()
This code should work.
I haven't done work w/ selenium and bs4 in a while so you might have to tweak it a little bit.

python nested Tags (beautiful Soup)

I used beautiful soup using python to get data from a specific website
but I don't know how to get one of these prices but I want the price in gram (g)
AS shown below this is the HTML codeL:
<div class="promoPrice margBottom7">16,000
L.L./200g<br/><span class="kiloPrice">79,999
I use this code:
p_price = product.findAll("div{"class":"promoPricemargBottom7"})[0].text
my result was:
16,000 L.L./200g 79,999 L.L./Kg
but i want to have:
16,000 L.L./200g
You will need to first decompose the span inside the div element:
from bs4 import BeautifulSoup
h = """
<div class="promoPrice margBottom7">16,000 L.L./200g<br/>
<span class="kiloPrice">79,999 L.L./Kg</span></div>
soup = BeautifulSoup(h, "html.parser")
element = soup.find("div", {'class': 'promoPrice'})
#16,000 L.L./200g
Try using soup.select_one('div.promoPrice').contents[0]
from bs4 import BeautifulSoup
html = """<div class="promoPrice margBottom7">16,000 L.L./200g<br/>
<span class="kiloPrice">79,999 L.L./Kg</span></div>"""
soup = BeautifulSoup(html, features='html.parser')
# value ='div.promoPrice > span') # for 79,999 L.L./Kg
value = soup.select_one('div.promoPrice').contents[0]
16,000 L.L./200g

Python BeautifulSoup - get values from p

html = '<p class="product-new-price">96<sup>33</sup> <span class="tether-target tether-enabled tether-element-attached-top tether-element-attached-left tether-target-attached-top tether-target-attached-right">Lei</span>
soup = BeautifulSoup(html, 'html.parser')
sup_elem = soup.find("sup").string # 33 - it works
How do I get the "96" before the element ?
You can grab the previousSibling tag
from bs4 import BeautifulSoup
html = '''<p class="product-new-price">96<sup>33</sup> <span class="tether-target tether-enabled tether-element-attached-top tether-element-attached-left tether-target-attached-top tether-target-attached-right">Lei</span>
soup = BeautifulSoup(html, 'html.parser')
elem1 = soup.find("sup").previousSibling
elem2 = soup.find("sup").text # 33 - it works
print ('.'.join([elem1, elem2]))
You can use children method. It will return a list of all the children of p tag. (6 will be first child of it.
html = '<p class="product-new-price">96<sup>33</sup> <span class="tether-target tether-enabled tether-element-attached-top tether-element-attached-left tether-target-attached-top tether-target-attached-right">Lei</span>
soup = BeautifulSoup(html, 'html.parser')
elem = list(soup.find("p").children)[0] #0th element of the list will be 96
sup_elem = soup.find("sup").string
result = elem + '.' + sup_elem #96.33
Use select instead.
from bs4 import BeautifulSoup
html = '''<p class="product-new-price">96<sup>33</sup> <span class="tether-target tether-enabled tether-element-attached-top tether-element-attached-left tether-target-attached-top tether-target-attached-right">Lei</span>
soup = BeautifulSoup(html, 'html.parser')
There is no "." in source but you can always divide by 100

how to extract an attribute value of div using BeautifulSoup

I have a div whose id is "img-cont"
<div class="img-cont-box" id="img-cont" style='background-image: url("");'>
I want to extract the url in background-image using beautiful soup.How can I do it?
You can you find_all or find for the first match.
import re
soup = BeautifulSoup(html_str)
result = soup.find('div',attrs={'id':'img-cont','style':True})
if result is not None:
url = re.findall('\("(http.*)"\)',result['style']) # return a list.
Try this:
import re
from bs4 import BeautifulSoup
html = '''\
<div class="img-cont-box" \
id="img-cont" \
style='background-image: url("");'>\
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', id='img-cont')
print('url\("(.+)"\)', div['style']).group(1))

how to print only text beautifulsoup

I am trying to learn how beautifulsoup works in order to create an application.
I am able to find and print all elements with .find_all() however they print the html tags as well. How can I print ONLY the text within these tags.
This is what I have:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
print i
This may help you:-
from bs4 import BeautifulSoup
source_code = """<html>
soup = BeautifulSoup(source_code)
print soup.text
soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.find_all('p')
for p in i:
print p.text
find_all() will return a list of tag, you should iterate over it and use tag.text to get the text under the tag
Better way:
for p in soup.find_all('p'):
print p.text
I think you can do what they do in this stackoverflow question. Use findAll(text=True). So in your code:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('index.html'), "html.parser")
i = soup.findAll(text=True)
print i
