Hi I am using the beautiful soup library to parse content from an html page.
I use the following script the get to the part of the page I want to:
review_list = soup.find(class_="review_list_score_breakdown_right")
<span class=" review_list_score_breakdown_right">
<ul class="review_score_breakdown_list list_tighten clearfix" data-et-view="bLTQHcXJVNRCSPOMcAQJO:1 bLTQHcXJVNRCSPOMcAQJO:3 " id="review_list_score_breakdown">
<li class="clearfix one_col" data-question="hotel_clean">
<p class="review_score_name">
Cleanliness
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
<li class="clearfix one_col" data-question="hotel_comfort">
<p class="review_score_name">
Comfort
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
<li class="clearfix one_col" data-question="hotel_services">
<p class="review_score_name">
Facilities
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
<li class="clearfix one_col" data-question="hotel_staff">
<p class="review_score_name">
Staff
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
<li class="clearfix one_col" data-question="hotel_value">
<p class="review_score_name">
Value for money
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
<li class="clearfix one_col" data-question="hotel_wifi">
<p class="review_score_name">
Free WiFi
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
<li class="clearfix one_col" data-question="hotel_location">
<p class="review_score_name">
Location
</p>
<div class="score_bar">
<div class="score_bar_value" data-score="100" style="width: 100%;">
</div>
</div>
<p class="review_score_value">
10
</p>
</li>
</ul>
</span>
I need to extract the score from the data-question tags. For example, if I want to know the hotel comfort score, I'd need to access data-question= "hotel_confort" I've tried with the function find() but it doesn't work.
There is no hotel_confort attrs in your codes.
review = soup.find(class_="review_list_score_breakdown_right")
hotel = review.find(attrs={"data-question" : "hotel_comfort"})
This code returns
<li class="clearfix one_col" data-question="hotel_comfort"> ..... </li>
I think what you need is the attrs find query.
Your question is similar to Extracting an attribute value with beautifulsoup
I will make it a bit specific for your case.
review = soup.find(class_="review_list_score_breakdown_right")
input = review.find(attrs={"data-question" : "hotel-comfort"})
output = input['value']
It's been awhile since I used bs4 so please debug the code.
Edit:
Here's some working code taken from your example string
review = soup.find('span', {'class' : "review_list_score_breakdown_right"})
input = review.find_all(attrs={"data-question": "hotel_comfort"})
print(input) #print the html extract which you can go down further.
Related
Im struggling with scraping a few pages ... it happens when the structure of the page implies a lot of nested divs...
Here is the code page:
<div>
<section class="ui-accordion-header ui-state-default ui-corner-all ui-accordion-icons" role="tab" id="ui-id-1" aria-controls="ui-id-2" aria-selected="false" aria-expanded="false" tabindex="0"><span class="ui-accordion-header-icon ui-icon ui-icon-triangle-1-e"></span>
<div class="detail-avocat">
<div class="nom-avocat">Me <span class="avocat_name">NAME </span></div>
<div class="type-avocat">Avocat postulant au Tribunal Judiciaire</div>
</div>
<div class="more-info">Plus d'informations</div>
</section>
<div class="ui-accordion-content ui-helper-reset ui-widget-content ui-corner-bottom" style="display: none;" id="ui-id-2" aria-labelledby="ui-id-1" role="tabpanel" aria-hidden="true">
<div class="details">
<div class="detail-avocat-row ">
<div class="detail-avocat-content overflow-h">
<span>Structure :</span>
<div>
<p>Cabinet individuel NAME</p>
</div>
</div>
</div>
<div class="detail-avocat-row ">
<div class="detail-avocat-content overflow-h">
<span>Adresse :</span>
<div>
<p>21 rue Belle Isle 57000 VILLE</p>
</div>
</div>
</div>
<div class="detail-avocat-row ">
<div class="detail-avocat-content overflow-h">
<span>Mail :</span>
<div>
<p>cabinet#mail.fr</p>
</div>
</div>
</div>
<div class="detail-avocat-row">
<div class="detail-avocat-content overflow-h">
<span>Tél :</span>
<div>
<p>Telnum</p>
</div>
</div>
</div>
<div class="detail-avocat-row">
<div class="detail-avocat-content overflow-h">
<span>Fax :</span>
<div>
<p> </p>
</div>
</div>
</div>
<div class="contact-avocat"> Contacter </div>
</div>
</div>
</div>
And here is my python code:
divtel = self.driver.find_elements(by=By.XPATH,
value=f'//div[#class="detail-avocat-content overflow-h"]/div/p')#div[#class="detail-avocat-content overflow-h"]')
for p in divtel:
print(p.text)
It doesnt print anything...with other similar pages it prints the text but in this case it doesnt altough there is text in the nested span and div/p . Do you know why?
How can i resolve my problem please?
thank you
The method .text works only when the webelement containing the text is visible in the webpage. If otherwise the webelement is hidden, you have to use .get_attribute('innerText') or .get_attribute('textContent') or .get_attribute('innerHTML') (see here for difference between them). So for example change
print(p.text)
to
print(p.get_attribute('innerText'))
The HTML is located below, If the span value is less than 20%, then I want to remove the span child up until the <div class="action"> parent only.
So for example:
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 5% </span>
</div>
</div>
</div>
</div>
From the above HTML, these code should only be removed:
<div class="action">
<div class="content">
<span class="content-name"> 5% </span>
</div>
</div>
So what should left is:
<div class="item">
<div class="info">
</div>
</div>
This is my current python code:
items = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[#class='content-name']")))
for item in items:
percentage_text = re.findall("\d+", item.text)[0]
if int(percentage_text) <= 20:
driver.execute_script("arguments[0].remove();", item)
But it only removes the span class and not its parent.
Here is the full HTML, I think it needs javascript to remove elements but I am very new on javascript I researched for more than 2 hours and I still can't find solutions. Thank you very much.
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 5% </span>
</div>
</div>
</div>
</div>
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 95% </span>
</div>
</div>
</div>
</div>
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 32% </span>
</div>
</div>
</div>
</div>
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 15% </span>
</div>
</div>
</div>
</div>
get to the parent of the parent:
driver.execute_script("arguments[0].parentElement.parentElement.remove();", item)
<li class="mod-tile">
<ul class="gallery-hidden gallery">
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/1982/1982-1579465585-271404739.png" data-sub-html="Stracker's Loader " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/1982/1982-1579465585-271404739.png" />
</ul>
<div data-mod-id="1982" data-game-id="2531" class="mod-tile-left ">
<div class="expandtile">
<ul class="btnexpand btnoverlay inline-flex">
<div class="padding"></div>
<svg title="" class="icon-plus"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-plus"></use></svg> <li>
<ul>
<li><a class="mod-view" href="index.html">View mod page</a></li>
<li><a class="mod-gallery" href="index.html">View image gallery</a></li>
</ul>
</li>
</ul>
</div>
<a class="mod-image" href="index.html">
<figure class="image_figure">
<img class="back" src="https://www.nexusmods.com/assets/images/default/noimage.svg" width="600" height="338">
<div class="fore_div_mods">
<img class="fore" onerror="imgError(this,'https://www.nexusmods.com/assets/images/default/noimage.svg')" loading="lazy" src="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/1982/1982-1579465585-271404739.png" alt="Stracker's Loader" title="Stracker's Loader">
</div>
</figure>
</a>
<div class="tile-desc motm-tile">
<div class="fadeoff"></div>
<div class="tile-content">
<h3>Stracker's Loader</h3>
<div class="meta clearfix">
<div class="category">
Utilities
</div>
<time class="date" datetime="2020-01-19 20:26"> <span class="label">Uploaded: </span>
19 Jan 2020 </time>
<div class="date"><span class="label">Last Update:</span> 04 Dec 2020</div>
<div class="realauthor"><span class="label">Author: </span> Stracker</div>
<div class="author"><span class="label">Uploader: </span> Stracker</div>
</div>
<p class="desc">
Restores full nativePC functionality. </p>
</div>
</div>
<div class="tile-data">
<ul class="clearfix">
<li class="sizecount inline-flex">
<svg title="" class="icon icon-filesize"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-filesize"></use></svg> <span class="flex-label">
703KB </span>
</li>
<li class="endorsecount inline-flex">
<svg title="" class="icon icon-endorse"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-endorse"></use></svg> <span class="flex-label">26.0k</span>
</li>
<li class="downloadcount inline-flex">
<svg title="" class="icon icon-downloads"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-downloads"></use></svg> <span class="flex-label"> -- </span>
</li>
</ul>
</div>
</div>
<div class="mod-tile-right">
<div class="tile-desc">
<div class="fadeoff"></div>
<div class="tile-content">
<h3>Stracker's Loader</h3>
<div class="meta clearfix">
<div class="category">
Utilities
</div>
<time class="date" datetime="2020-01-19 20:26"> <span class="label">Uploaded: </span>
19 Jan 2020 </time>
<div class="date"><span class="label">Last Update:</span> 04 Dec 2020</div>
<div class="author"><span class="label">Author: </span>Stracker</div>
</div>
<p class="desc">
Restores full nativePC functionality. </p>
</div>
</div>
</div>
</li>
<li class="mod-tile">
<ul class="gallery-hidden gallery">
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/112/112-1579010242-745113274.png" data-sub-html="Souvenir's Light Pillar " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/112/112-1579010242-745113274.png" />
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/112/112-1579010135-1239485031.png" data-sub-html="Souvenir's Light Pillar " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/112/112-1579010135-1239485031.png" />
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/112/112-1579010181-399571475.png" data-sub-html="Souvenir's Light Pillar " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/112/112-1579010181-399571475.png" />
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/112/112-1579259504-344088346.png" data-sub-html="Souvenir's Light Pillar " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/112/112-1579259504-344088346.png" />
</ul>
<div data-mod-id="112" data-game-id="2531" class="mod-tile-left ">
<div class="expandtile">
<ul class="btnexpand btnoverlay inline-flex">
<div class="padding"></div>
<svg title="" class="icon-plus"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-plus"></use></svg> <li>
<ul>
<li><a class="mod-view" href="index.html">View mod page</a></li>
<li><a class="mod-gallery" href="index.html">View image gallery</a></li>
</ul>
</li>
</ul>
</div>
<a class="mod-image" href="index.html">
<figure class="image_figure">
<img class="back" src="https://www.nexusmods.com/assets/images/default/noimage.svg" width="600" height="338">
<div class="fore_div_mods">
<img class="fore" onerror="imgError(this,'https://www.nexusmods.com/assets/images/default/noimage.svg')" loading="lazy" src="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/112/112-1579010242-745113274.png" alt="Souvenir's Light Pillar" title="Souvenir's Light Pillar">
</div>
</figure>
</a>
<div class="tile-desc motm-tile">
<div class="fadeoff"></div>
<div class="tile-content">
<h3>Souvenir's Light Pillar</h3>
<div class="meta clearfix">
<div class="category">
Visuals and Graphics
</div>
<time class="date" datetime="2018-09-03 19:58"> <span class="label">Uploaded: </span>
03 Sep 2018 </time>
<div class="date"><span class="label">Last Update:</span> 17 Jan 2020</div>
<div class="realauthor"><span class="label">Author: </span> 2hh8899</div>
<div class="author"><span class="label">Uploader: </span> 2hh8899</div>
</div>
<p class="desc">
It lights up the souvenirs for making them easier to find.유실물을 찾기 쉽게 하기 위해 빛기둥을 박아넣었습니다. </p>
</div>
</div>
<div class="tile-data">
<ul class="clearfix">
<li class="sizecount inline-flex">
<svg title="" class="icon icon-filesize"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-filesize"></use></svg> <span class="flex-label">
59KB </span>
</li>
<li class="endorsecount inline-flex">
<svg title="" class="icon icon-endorse"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-endorse"></use></svg> <span class="flex-label">22.7k</span>
</li>
<li class="downloadcount inline-flex">
<svg title="" class="icon icon-downloads"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-downloads"></use></svg> <span class="flex-label"> -- </span>
</li>
</ul>
</div>
</div>
<div class="mod-tile-right">
<div class="tile-desc">
<div class="fadeoff"></div>
<div class="tile-content">
<h3>Souvenir's Light Pillar</h3>
<div class="meta clearfix">
<div class="category">
Visuals and Graphics
</div>
<time class="date" datetime="2018-09-03 19:58"> <span class="label">Uploaded: </span>
03 Sep 2018 </time>
<div class="date"><span class="label">Last Update:</span> 17 Jan 2020</div>
<div class="author"><span class="label">Author: </span>2hh8899</div>
</div>
<p class="desc">
It lights up the souvenirs for making them easier to find.유실물을 찾기 쉽게 하기 위해 빛기둥을 박아넣었습니다. </p>
</div>
</div>
</div>
</li>
<li class="mod-tile">
<ul class="gallery-hidden gallery">
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/43/43-1534824818-145235267.png" data-sub-html="MHW Transmog " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/43/43-1534824818-145235267.png" />
<li class="thumb" data-src="https://staticdelivery.nexusmods.com/mods/2531/images/43/43-1534825195-804021906.png" data-sub-html="MHW Transmog " data-exthumbimage="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/43/43-1534825195-804021906.png" />
</ul>
<div data-mod-id="43" data-game-id="2531" class="mod-tile-left ">
<div class="expandtile">
<ul class="btnexpand btnoverlay inline-flex">
<div class="padding"></div>
<svg title="" class="icon-plus"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-plus"></use></svg> <li>
<ul>
<li><a class="mod-view" href="index.html">View mod page</a></li>
<li><a class="mod-gallery" href="index.html">View image gallery</a></li>
</ul>
</li>
</ul>
</div>
<a class="mod-image" href="index.html">
<figure class="image_figure">
<img class="back" src="https://www.nexusmods.com/assets/images/default/noimage.svg" width="600" height="338">
<div class="fore_div_mods">
<img class="fore" onerror="imgError(this,'https://www.nexusmods.com/assets/images/default/noimage.svg')" loading="lazy" src="https://staticdelivery.nexusmods.com/mods/2531/images/thumbnails/43/43-1534824818-145235267.png" alt="MHW Transmog" title="MHW Transmog">
</div>
</figure>
</a>
<div class="tile-desc motm-tile">
<div class="fadeoff"></div>
<div class="tile-content">
<h3>MHW Transmog</h3>
<div class="meta clearfix">
<div class="category">
Utilities
</div>
<time class="date" datetime="2018-08-21 05:37"> <span class="label">Uploaded: </span>
21 Aug 2018 </time>
<div class="date"><span class="label">Last Update:</span> 04 Dec 2020</div>
<div class="realauthor"><span class="label">Author: </span> Approved</div>
<div class="author"><span class="label">Uploader: </span> FineNerds</div>
</div>
<p class="desc">
A mod that allows you to hot swap your appearance with any armor of your choice. Visible to other players!As with any mod for games that don't support mods. This is USE AT YOUR OWN RISK. </p>
</div>
</div>
<div class="tile-data">
<ul class="clearfix">
<li class="sizecount inline-flex">
<svg title="" class="icon icon-filesize"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-filesize"></use></svg> <span class="flex-label">
260KB </span>
</li>
<li class="endorsecount inline-flex">
<svg title="" class="icon icon-endorse"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-endorse"></use></svg> <span class="flex-label">12.2k</span>
</li>
<li class="downloadcount inline-flex">
<svg title="" class="icon icon-downloads"><use xlink:href="https://www.nexusmods.com/assets/images/icons/icons.svg#icon-downloads"></use></svg> <span class="flex-label"> -- </span>
</li>
</ul>
</div>
</div>
<div class="mod-tile-right">
<div class="tile-desc">
<div class="fadeoff"></div>
<div class="tile-content">
<h3>MHW Transmog</h3>
<div class="meta clearfix">
<div class="category">
Utilities
</div>
<time class="date" datetime="2018-08-21 05:37"> <span class="label">Uploaded: </span>
21 Aug 2018 </time>
<div class="date"><span class="label">Last Update:</span> 04 Dec 2020</div>
<div class="author"><span class="label">Author: </span>FineNerds</div>
</div>
<p class="desc">
A mod that allows you to hot swap your appearance with any armor of your choice. Visible to other players!As with any mod for games that don't support mods. This is USE AT YOUR OWN RISK. </p>
</div>
</div>
</div>
</li>
So I want selenium to open the first link from that has the class "mod-tile" then it will do a script that I have made, then I want it to open the next link that has the same class "mod-tile". Is there any way to specify this? (btw don't mind the description I just copied the first 3 mod tiles that appeared on the website)
You can get a list of elements with the find_elements_by_class_name() method.
Then iterate through the list of elements.
[UPDATED]
Following should click the fist link (//a) within the "mod-tile" elements
elements = driver.find_elements_by_class_name("mod-tile")
print("mod-title count {}".format(len(elements)))
for element in range(len(elements)):
elements[element].find_element_by_xpath("//a").click()
I'm trying to extract text from parent comments on the website songmeanings.com using BeautifulSoup from the following HTML:
<div class="text" id="comment-73014911864">
<strong class="title">
General Comment
</strong>
This is a beautiful song. I love it a lot. He is the ONLY, and yes, ONLY rapper I will listen to. Because,
<br/>
<br/>
(a) His songs have meaning. They're not about sex and cars and bling blingin' rims.
<br/>
(b) He has talent. He can actually rap. I don't think d12 is any good. =/
<br/>
<br/>
Anyway. I love this song and I'm getting his new CD right now... hehe.
<br/>
-Sarah
<div class="sign">
<a class="author" href="/profiles/view/17067478/" id="userprofile-17067478" rel="me nofollow" title="xoDonnieDarko">
xoDonnieDarko
</a>
<em class="date">
on December 06, 2005
</em>
<a href="/songs/view/3530822107858560012/?&specific_com=73014911864#comments" id="specific_com-73014911864" rel="nofollow" title="Permalink">
Link
</a>
</div>
<ul class="answers">
<li>
<div class="title">
<a class="replies close-replies" href="#" id="showreplies-73014911864" rel="nofollow" title="3 Replies">
3 Replies
</a>
<span class="login">
<a class="lightbox" href="#popup-loginform" rel="nofollow">
Log in to reply
</a>
</span>
<br>
</br>
</div>
<div id="formreply-73014911864" style="display: none;">
<!-- comment-form -->
<form action="#" class="comment-form-reply" id="comment-form-reply-73014911864">
<div class="area" id="reply-errors-box" style="display: none;">
<label for="type">
</label>
<span id="reply-errors" style="color: #ff0000;">
There was an error.
</span>
</div>
<div class="area">
<div class="textarea">
<div class="holder">
<div class="frame">
<textarea class="frmreplycomment-73014911864" id="frmreplycomment" name="frmreplycomment">
#xoDonnieDarko
</textarea>
</div>
</div>
</div>
</div>
<input id="frmreplylid" name="frmreplylid" type="hidden" value="3530822107858560012">
<input id="frmaid" name="frmaid" type="hidden" value="94">
<input id="frmreplycid" name="frmreplycid" type="hidden" value="73014911864">
<input class="submit" type="submit" value="Add reply"/>
</input>
</input>
</input>
</form>
</div>
<div id="thesereplies-73014911864" style="display: none;">
<div class="answer-holder" id="fullcomment-73015890665">
<a name="comment-73015890665">
</a>
<div id="rating-holder-73015890665">
<div class="numb-holder">
<span id="com-rating-73015890665">
<strong class="numb" id="numb-rating-73015890665">
+1
</strong>
</span>
<div class="com-whorated" id="com-whorated-73015890665" style="display: none; text-align: center;">
<span class="processing">
</span>
</div>
<div id="processing-73015890665" style="text-align: center; padding: 8px 8px 0px 12px; display: none;">
<span class="processing">
</span>
</div>
</div>
</div>
<div class="text">
i agree he is the only rapper i can listen too.
<div class="sign">
<span id="flagspan-73015890665">
<a class="flag" href="#" id="flag-73015890665">
Flag
</a>
</span>
<a class="author" href="/profiles/view/17374833/" id="userprofile-17374833" rel="me nofollow" title="byrdman1992">
byrdman1992
</a>
<em class="date">
on March 15, 2010
</em>
</div>
</div>
</div>
<div class="answer-holder" id="fullcomment-73015961779">
<a name="comment-73015961779">
</a>
<div id="rating-holder-73015961779">
<div class="numb-holder">
<span id="com-rating-73015961779">
<strong class="numb" id="numb-rating-73015961779">
0
</strong>
</span>
<div class="com-whorated" id="com-whorated-73015961779" style="display: none; text-align: center;">
<span class="processing">
</span>
</div>
<div id="processing-73015961779" style="text-align: center; padding: 8px 8px 0px 12px; display: none;">
<span class="processing">
</span>
</div>
</div>
</div>
<div class="text">
same her the ONLY one...and sometimes lil' wayne! lol
<div class="sign">
<span id="flagspan-73015961779">
<a class="flag" href="#" id="flag-73015961779">
Flag
</a>
</span>
<a class="author" href="/profiles/view/17418133/" id="userprofile-17418133" rel="me nofollow" title="dancer017">
dancer017
</a>
<em class="date">
on August 26, 2010
</em>
</div>
</div>
</div>
<div class="answer-holder" id="fullcomment-73016306033">
<a name="comment-73016306033">
</a>
<div id="rating-holder-73016306033">
<div class="numb-holder">
<span id="com-rating-73016306033">
<strong class="numb" id="numb-rating-73016306033">
0
</strong>
</span>
<div class="com-whorated" id="com-whorated-73016306033" style="display: none; text-align: center;">
<span class="processing">
</span>
</div>
<div id="processing-73016306033" style="text-align: center; padding: 8px 8px 0px 12px; display: none;">
<span class="processing">
</span>
</div>
</div>
</div>
<div class="text">
<a href="/profiles/view/17067478/?mention=12eeb84af5d911243541dc3bf651fc7b" id="userprofile-17067478" rel="me nofollow" title="#xoDonnieDarko">
#xoDonnieDarko
</a>
RIttz is pretty good.. Can listen to yela and tech too.
<div class="sign">
<span id="flagspan-73016306033">
<a class="flag" href="#" id="flag-73016306033">
Flag
</a>
</span>
<a class="author" href="/profiles/view/17643918/" id="userprofile-17643918" rel="me nofollow" title="Heeltoehole">
Heeltoehole
</a>
<em class="date">
on September 05, 2015
</em>
</div>
</div>
</div>
</div>
</li>
</ul>
</div>
<div class="text">
i agree he is the only rapper i can listen too.
<div class="sign">
<span id="flagspan-73015890665">
<a class="flag" href="#" id="flag-73015890665">
Flag
</a>
</span>
<a class="author" href="/profiles/view/17374833/" id="userprofile-17374833" rel="me nofollow" title="byrdman1992">
byrdman1992
</a>
<em class="date">
on March 15, 2010
</em>
</div>
</div>
<div class="text">
same her the ONLY one...and sometimes lil' wayne! lol
<div class="sign">
<span id="flagspan-73015961779">
<a class="flag" href="#" id="flag-73015961779">
Flag
</a>
</span>
<a class="author" href="/profiles/view/17418133/" id="userprofile-17418133" rel="me nofollow" title="dancer017">
dancer017
</a>
<em class="date">
on August 26, 2010
</em>
</div>
</div>
<div class="text">
<a href="/profiles/view/17067478/?mention=12eeb84af5d911243541dc3bf651fc7b" id="userprofile-17067478" rel="me nofollow" title="#xoDonnieDarko">
#xoDonnieDarko
</a>
RIttz is pretty good.. Can listen to yela and tech too.
<div class="sign">
<span id="flagspan-73016306033">
<a class="flag" href="#" id="flag-73016306033">
Flag
</a>
</span>
<a class="author" href="/profiles/view/17643918/" id="userprofile-17643918" rel="me nofollow" title="Heeltoehole">
Heeltoehole
</a>
<em class="date">
on September 05, 2015
</em>
</div>
</div>
Using this code I am able to extract most text from the comments, but any comments with line breaks will have missing content:
import urllib2
from bs4 import BeautifulSoup
url = "http://songmeanings.com/songs/view/3530822107858560012/"
response = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(url)
html_doc = response.read()
soup = BeautifulSoup(html_doc, 'html.parser')
for strong_tag in soup.find_all('strong'):
print strong_tag.next_sibling
Which gives the output:
This is a beautiful song. I love it a lot. He is the ONLY, and yes, ONLY rapper I will listen to. Because,
What I want is:
This is a beautiful song. I love it a lot. He is the ONLY, and yes, ONLY rapper I will listen to. Because,
(a) His songs have meaning. They're not about sex and cars and bling blingin' rims.
(b) He has talent. He can actually rap. I don't think d12 is any good. =/
Anyway. I love this song and I'm getting his new CD right now... hehe.
-Sarah
How can I extract all text from a parent comment? Is there a better way to do this than using the strong tag?
I slightly modified https://stackoverflow.com/a/11809215/42346 (give him an upvote!) to get this solution:
def loop_until(text,first_elem):
try:
text += first_elem.string
if first_elem.next == first_elem.find_next('div'):
return text
else:
return loop_until(text,first_elem.next.next)
except TypeError:
pass
Call it like this:
next_elem = soup.find_all('strong')[0].nextSibling
loop_until('',next_elem)
Result:
u"\n This is a beautiful song. I love it a lot. He is the ONLY, and yes, ONLY rapper I will listen to. Because,\n \n\n (a) His songs have meaning. They're not about sex and cars and bling blingin' rims.\n \n (b) He has talent. He can actually rap. I don't think d12 is any good. =/\n \n\n Anyway. I love this song and I'm getting his new CD right now... hehe.\n \n -Sarah\n "
I am trying to download a table from html which is not in the usual td/ tr format and includes images.
The html code looks like this:
<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<div class="improve_listing_btn ui_button primary small">improve this entry</div>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>
I would like to get the table:
[Location 45 out of fifty points,
Service 45 out of fifty points].
The following code only prints "Location" and "Service" and does not include the rating.
for url in urls:
r=requests.get(url)
time.sleep(delayTime)
soup=BeautifulSoup(r.content, "lxml")
data17= soup.findAll('div', {'class' :'dynamicBottom'})
for item in (data17):
print(item.text)
And the code
data18= soup.find(attrs={'class': 'sprite-rating_s_fill rating_s_fill s45'})
print(data18["alt"] if data18 else "No meta title given")
does not help either since it is not clear which rating it represents since it only prints out "45 out of fifty points" but it is not clear for which category. Additionally, the image tag ('sprite-rating_s_fill rating_s_fill s45') varies in other tables depending on the rating.
Is there a way to extract the full table?
Or to tell Python to extract the image after a certain word, e.g. "Location"?
Thank you very much for your help!
html = '''<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<div class="improve_listing_btn ui_button primary small">improve this entry</div>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="ratingRow wrap"):
text = div.text.strip()
alt = div.find('img').get('alt')
print(text, alt)
out:
Location 45 out of fifty points
Service 45 out of fifty points