I'm modifying a HTML file using Python and BeautifulSoup,and I can change the content of headers,but I couldn't find a way to change the class of a div. My goal is to turn
<div id="div1" class="blue_titles">test</div>
into:
<div id="div1" class="green_titles">test</div>
I looked up and down the docs,but to no avail. It's probably right on my face,but I can't find it.Thanks in advance!
You can simply assign the new value to the key class:
from bs4 import BeautifulSoup
soup = BeautifulSoup("""<div id="div1" class="blue_titles">test</div>""", "lxml")
soup.find("div")['class'] = "green_titles"
soup
# <html><body><div class="green_titles" id="div1">test</div></body></html>
Related
<div class="ELEMENT1">
<div class="ELEMENT2">
<div class="ELEMENT3">valeur1</div>
<div class="ELEMENT4">
<svg class="ELEMENT5 ">
<a href="ELEMENT6ยป target="ELEMENT7" class="ELEMENT8">
<div>TEXT</div
Hello to all,
My request is the following
From the following piece of code, I want to create a loop that allows me
to extract TEXT if and only if div class = ELEMENT 4 AND svg class = ELEMENT 5 (because there are other different ones)
thank you for your help
eddy
you'll need to import urllib2 or some other library that allows you to fetch a urls html structure. Then you need to import beautiful soup as well. Scrape the url and store into a variable. Then reformat the output in any way that serves your needs.
For example:
import urllib2
from bs4 import beautifulSoup
page = urlopen("the_url")
content = BeautifulSoup(page.read().decode("utf-8")) #decode data (utf-8)
filter = content.find_all("div") #finds all div elements in the body
Then you could use regexp to find the actual text inside the element.
Good luck on your assignment!
I'm trying to get links to group members:
response.css('.text--ellipsisOneLine::attr(href)').getall()
Why isn't this working?
html:
<div class="flex flex--row flex--noGutters flex--alignCenter">
<div class="flex-item _memberItem-module_name__BSx8i">
<a href="/ru-RU/Connect-IT-Meetup-in-Chisinau/members/280162178/profile/?returnPage=1">
<h4 class="text--bold text--ellipsisOneLine">Liviu Cernei</h4>
</a>
</div>
</div>
Your selector isn't working because you are looking for a attribute (href) that this element doesn't have.
response.css('.text--ellipsisOneLine::attr(href)').getall()
This selector is searching for href inside elements of class text--ellipsisOneLine. In your HTML snippet that class matches only with this:
<h4 class="text--bold text--ellipsisOneLine">Liviu Cernei</h4>
As you can see, there is no href attribute. Now, if you want the text between this h4 element you need to use ::text pseudo-element.
response.css('.text--ellipsisOneLine::text').getall()
Read more here.
I realize that this isn't scrapy, but personally for web scraping I use the requests module and BeautifulSoup4, and the following code snippet will get you a list of users with the aforementioned modules:
import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.meetup.com/ru-RU/Connect-IT-Meetup-in-Chisinau/members/')
if response.status_code == 200:
html_doc = response.text
html_source = BeautifulSoup(html_doc, 'html.parser')
users = html_source.findAll('h4')
for user in users:
print(user.text)
css:
response.css('.member-item .flex--alignCenter a::attr(href)').getall()
I'm working on a personal project where I scrape data from a website. I'm trying to use beautiful soup to do this but I came across data in the same class but a different attribute. For example:
<div class="pi--secondary-price">
<span class="pi--price">$11.99 /<abbr title="Kilogram">kg</abbr></span>
<span class="pi--price">$5.44 /<abbr title="Pound">lb.</abbr></span>
</div>
How do I just get $11.99/kg? Right now I'm getting
$11.99 /kg
$5.44 /lb.
I've done x.select('.pi--secondary-price') but it returns both prices. How do I only get 1 price ($11.99 /kg)?
You could first get the <abbr> tag and then search for the respective parent tag. Like this:
from bs4 import BeautifulSoup
html = '''
<div class="pi--secondary-price">
<span class="pi--price">$11.99 /<abbr title="Kilogram">kg</abbr></span>
<span class="pi--price">$5.44 /<abbr title="Pound">lb.</abbr></span>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
kg = soup.find(title="Kilogram")
print(kg.parent.text)
This gives you the desired output $11.99 /kg. For more information, see the BeautifulSoup documentation.
I have to scrape a web page using BeautifulSoup in python.So to extract the complete div which hass the relavent information and looks like the one below:
<div data-v-24a74549="" class="row row-mg-mod term-row">
I wrote soup.find('div',{'class':'row row-mg-mod term-row'}).
But it is returning nothing.I guess it is something to do with this data-v value.
Can someone tell the exact syntaxof scraping this type of data?
Give this a try:
from bs4 import BeautifulSoup
content = """
<div data-v-24a74549="" class="row row-mg-mod term-row">"""
soup = BeautifulSoup(content,'html.parser')
for div in soup.find_all("div", {"class" : "row"}):
print(div)
I am looking for a solution using Python and BeautifulSoup to find an element based on the inside text. For example:
<div> <b>Ignore this text</b>Find based on this text </div>
How can I find this div? Thanks for you helps!
You can use .find with the text argument and then use findParent to the parent element.
Ex:
from bs4 import BeautifulSoup
s="""<div> <b>Ignore this text</b>Find based on this text </div>"""
soup = BeautifulSoup(s, 'html.parser')
t = soup.find(text="Find based on this text ")
print(t.findParent())
Output:
<div> <b>Ignore this text</b>Find based on this text </div>
try it , it is like example but it works
from bs4 import BeautifulSoup
html="""
<div> <b>Ignore this text</b>Find based on this text </div>
"""
soup = BeautifulSoup(html, 'lxml')
s = soup.find('div')
for child in s.find_all('b'):
child.decompose()
print(s.get_text())
Output
Find based on this text