Best approach to get attribute text with BeautifulSoup

Best approach to get attribute text with BeautifulSoup - python

What would be the best way to get the text of the items class="field__label" y class="field__item" in the following code
Taking into consideration that there are other tags with the same class outside the div class="fieldset-wrapper" I just need the ones inside this tag.
HTML Example:
<div class="fieldset-wrapper">
<div class="field field--name-field-adresse-strasse-nr field--type-string field--label-inline clearfix">
<div class="field__label">TEXT</div>
<div class="field__item">TEXT</div>
</div>
<div class="field field--name-field-adresse-plz-ort field--type-string field--label-inline clearfix">
<div class="field__label">TEXT</div>
<div class="field__item">TEXT</div>
</div>
<div class="field field--name-field-adressen-bundesland field--type-entity-reference field--label-inline clearfix">
<div class="field__label">TEXT</div>
<div class="field__item">TEXT</div>
</div>
</div>

You can use css selectors to ensure that your target elements are descendants of the div class="fieldset-wrapper" element:
for item in soup.select('div.fieldset-wrapper div.field__item, div.fieldset-wrapper div.field__label'):
print(item.text)

Related

How to parse with BeautifuSoup Python?

For example I have code like this
<div class="container">
<div class="blablabla1">
<div class="blablabla2">
<div class="blablabla3">
<span class="hello">Hello</span>
</div>
</div>
</div>
</div>
How can I get <span> value or <span> class value?
Should I firstly find all containers?

Your question is not that clear but in generell you can access the class and text with css selector like this:
from bs4 import BeautifulSoup
html = '''
<div class="container">
<div class="blablabla1">
<div class="blablabla2">
<div class="blablabla3">
<span class="hello">Hello</span>
</div>
</div>
</div>
</div>
'''
soup = BeautifulSoup(html, "lxml")
spanText = soup.select_one('div.container span').text
spanClass = soup.select_one('div.container span')['class']

Once you have obtain the soup, you can use the find_all() method to find all <span> with the hello class:
all_hello_spans = soup.find_all({"span":{"class":"hello"}}})

Find xpath with following-siblings and contains text in Python Selenium

I know the very basics of using following-siblings but here I have a situation where it looks a bit more complicated.
I want to find the element with text Total 6.5 where the header is Total games. How can I do it with following-siblings and contains text?
<div class="group">
<div class="header_1">
<div class="section_1">
<div class="expander"></div>
<div class="star"></div>
<div class="text_3">Total games</div>
</div>
</div>
<div class="body_1">
<div class="horizontal">
<div class="grid">
<div class="row_common">
<div class="cell_wrap">
<div class="cell_align_wrap">
<div class="common_text">Total 6.5</div>
</div>
</div>
</div>
<div class="row_common">
...
</div>
</div>
</div>
</div>
</div>

This one should locate required element
//div[#class="header_1" and contains(., "Total games")]/following-sibling::div[#class="body_1"]//div[#class="common_text"]
you can also simplify it as
//div[#class="header_1" and contains(., "Total games")]/following::div[#class="common_text"]

Get xpath sibling text value - Python Selenium

Requirement:
/html/body/div[3]/div[4]/div/div[7]/div/div/div/div/p/b - Contains word "TITLE"
/html/body/div[3]/div[4]/div/div[8]/div/div/div/div/p - Contains "This is my description"
Actual HTML:
<div class="secadvheading section">
<div class="section-custom">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<p class="mt-15"><b>TITLE</b></p>
</div>
</div>
</div>
</div>
</div>
<div class="paragraphText parbase section">
<div class="section-custom ">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<p>This is my desciption</p>
</div>
</div>
</div>
</div>
Question:
How to get text content paragraph text after "TITLE" div?
Tried
driver.find_element_by_xpath("//*[contains(text(),'TITLE')]/following-sibling::p")
didn't worked. I may have multiple "TITLE in same page" how can i gracefully look for TITILE div (multiple elements) and get the description for the same?

You need to go out of TITLE's node first--go to ancestor node the use following-sibling. Try this:
//b[text()='TITLE']/ancestor::div[#class='secadvheading section']/following-sibling::div[#class='paragraphText parbase section']//p

python - html - how to change a position of a closing part of a tag / move whole section

I want to change a position of a closing part of a tag by removing from one place and placing into another. I try to use BeautifulSoup but the functions seem to work on whole tags. I don't know how to move just the part of the tag like </div> without destroying the the proceeding part of a tag.
how to change a position of a closing part of a tag
Example:
html = """
<html>
<body>
<div>
<div class="A">
<h1 id="H1">H1</h1>
</div>
<div>
<div class="B">
</div>
</div> < ----- remove from here
<div class="b1">
<div class="c">
</div>
</div>
< ----- place here
</div>
</body>
</html>
"""
soup = BeautifulSoup(html, 'html.parser')
One of my ideas is to cut the section
<div class="b1">
<div class="c">
</div>
</div>
and place after <div class="B"> using the function insert_after but I don't know how to move the whole section in one move.

By moving that </div> further down, you are in effect moving the b1 after the div after the A div. So you could copy the b1 div and append it to the other div. Then delete the original one. This could be done as follows:
from bs4 import BeautifulSoup
import copy
html = """
<html>
<body>
<div>
<div class="A">
<h1 id="H1">H1</h1>
</div>
<div>
<div class="B">
</div>
</div>
<div class="b1">
<div class="c">
</div>
</div>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html, 'html.parser')
div_append = soup.find('div', class_='A').find_next('div')
div_b1 = soup.find('div', class_='b1')
div_append.append(copy.copy(div_b1))
div_b1.extract()
print(soup.prettify())
This would result in the following HTML:
<html>
<body>
<div>
<div class="A">
<h1 id="H1">
H1
</h1>
</div>
<div>
<div class="B">
</div>
<div class="b1">
<div class="c">
</div>
</div>
</div>
</div>
</body>
</html>

wrapping html with a python function

I want to be able to wrap a div based on it's id. For example given the following HTML:
<body>
<div id="info">
<div id="a1">
</div>
<div id="a2">
<div id="description">
</div>
<div id="links">
link
</div>
</div>
</div>
</body>
I want to write a Python function that takes a document, an id, and a selector. and will wrap the given id in the given document in a div with the class or id selector. For example, lets say that the HTML above is in a variable doc
wrap(doc,'#a2','#wrapped')
will return the following HTML:
<body>
<div id="info">
<div id="a1">
</div>
<div id="wrapped">
<div id="a2">
<div id="description">
</div>
<div id="links">
link
</div>
</div>
</div>
</div>
</body>
I looked at some XML parsers and Python HTMLParser, but I have not found anything that gives me the capability to not only get everything inside a specific tag, but then be able to append strings and easily edit the document. If one does not exist, what would be a good approach to this?

from BeautifulSoup import BeautifulSoup
#div1 is to be wrapped with div2
def wrap(doc,div1_id,div2_id)
pool = BeautifulSoup(doc)
for div in pool.findAll('div', attrs={'id':div1_id}):
div.replaceWith('<div id='+div2_id+'>' + div.prettify() + '</div>' )
return pool.prettify()
wrap(doc,'a2','wrapped')

I recommend BeautifulSoup though it will bring some dependency but also a lot convenience. The following code can acheieve the goal of the wrap:
from bs4 import BeautifulSoup
data = '''<body>
<div id="info">
<div id="a1">
</div>
<div id="a2">
<div id="description">
</div>
<div id="links">
link
</div>
</div>
</div>
</body>'''
soup = BeautifulSoup(data)
div = soup.find('div', attrs={'id': 'a2'})
div.wrap(soup.new_tag('div', id='wrapper'))
And then print soup.prettify() we can see the result:
<html>
<body>
<div id="info">
<div id="a1">
</div>
<div id="wrapper">
<div id="a2">
<div id="description">
</div>
<div id="links">
<a href="http://example.com">
link
</a>
</div>
</div>
</div>
</div>
</body>
</html>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best approach to get attribute text with BeautifulSoup - python

You can use css selectors to ensure that your target elements are descendants of the div class="fieldset-wrapper" element: for item in soup.select('div.fieldset-wrapper div.fielditem, div.fieldset-wrapper div.fieldlabel'): print(item.text)

Related

How to parse with BeautifuSoup Python?

Find xpath with following-siblings and contains text in Python Selenium

Get xpath sibling text value - Python Selenium

python - html - how to change a position of a closing part of a tag / move whole section

wrapping html with a python function

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best approach to get attribute text with BeautifulSoup - python

You can use css selectors to ensure that your target elements are descendants of the div class="fieldset-wrapper" element: for item in soup.select('div.fieldset-wrapper div.field__item, div.fieldset-wrapper div.field__label'): print(item.text)

Related

How to parse with BeautifuSoup Python?

Find xpath with following-siblings and contains text in Python Selenium

Get xpath sibling text value - Python Selenium

python - html - how to change a position of a closing part of a tag / move whole section

wrapping html with a python function

Categories

Resources

You can use css selectors to ensure that your target elements are descendants of the div class="fieldset-wrapper" element: for item in soup.select('div.fieldset-wrapper div.fielditem, div.fieldset-wrapper div.fieldlabel'): print(item.text)