Is it possible to make a Beautifulsoup Strainer that strains all 'order-cards' from 'container-01' only (without 'order-cards' from other containers)?
Below the sample HTML
<div class="items-container" container-id="container-01">
<div class="order-card">order_01
<div class="item-card">item1</div>
<div class="item-card">item2</div>
<div class="item-card">item3</div>
<div class="item-card">item4</div>
</div>
<div class="order-card">order_02
<div class="item-card">itemA</div>
<div class="item-card">itemB</div>
<div class="item-card">itemC</div>
<div class="item-card">itemD</div>
</div>
<div class="order-card">order_03
<div class="item-card">itemW</div>
<div class="item-card">itemX</div>
<div class="item-card">itemY</div>
<div class="item-card">itemZ</div>
<div class="item-card">item</div>
</div>
</div>
<div class="items-container" container-id="container-02">
<div class="order-card">order_53
<div class="item-card">item_7</div>
<div class="item-card">item_8</div>
</div>
</div>
<div class="items-container" container-id="container-03">
<div class="order-card">order_13
<div class="item-card">item_16</div>
<div class="item-card">item_17</div>
<div class="item-card">item_18</div>
</div>
</div>
What I have so far is the code below which strains ALL 'order-cards' from ALL containers.
The goal is that 'page_soup' contains ALL 'order-card' items that are in 'container-01' only.
The following loop then uses that 'page_soup' to iterate through each item in 'order-card' to get the details from each 'item-card'.
rephrased above!
The goal is to get the details from each 'item-card' that are in 'container-01' only.
There is no need for parsing any other containers than 'container-01'.
only_item_cells = SoupStrainer('div', attrs={"class":"order-card"})
page_soup = BeautifulSoup(page_html, 'html.parser', parse_only=only_item_cells)
Following that is a loop that gets the details from ALL the 'item-cards' in ALL containers. In fact, that is NOT wanted, as the output includes items from containers other than 'container-01' only.
Running Python 3.8.8, on Anaconda, Win64
Use the appropriate attribute as you have indicated:
only_item_cells = SoupStrainer('div', attrs= {"container-id": "container-01"})
(1)
</div>
<div class="n_cont5" id="nct7">
<div class="nc_tit">说明书:</div>
<div class="nc5" id="smsdiv">
正在查询请稍候......
</div>
</div>
<div class="n_page">上一篇第<span class="cur">2</span>篇下一篇共<span>53</span>篇转到第
<input type="text" name="pages" id="pages"
onkeydown="return SubmitKeyClick(this,event)"
onkeyup="value=value.replace(/[^\d]/g,'')"
onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^\d]/g,''))"/>
篇</div>
</div>
(2)
<a href="javascript:noAction()" title="PDF下载"
onclick="pdfDownloadDetail('Unexamined_patent_for_invention/2016/20160330/CN105452223A/PDF_PID/CN112014000037041CN00001054522230APDFZH20160330CN00F.PDF,CN201480037041.6')" href="javascript:noAction()">PDF下载</a>
</dd>
</dl>
</div>
</li>
<li>打印</li>
<li><a class="icon7" href="javascript:noAction();" class="zidongfanyi"
onclick="translateToEn('CN201480037041.6', 'FMZL_EN,SYXX_EN')">中译英</a></li>
</ul>
<div class="clear"></div>
</div>
<div class="clear"></div>
</div>
<div class="n_page">上一篇第<span class="cur">2</span>篇下一篇共<span>53</span>篇转到第
<input type="text" name="pages" id="pages"
onkeydown="return SubmitKeyClick(this,event)"
onkeyup="value=value.replace(/[^\d]/g,'')"
onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^\d]/g,''))"/>
篇</div>
Here are two very similar content in the html content. Iwant to get the number "53" from the first?. I used the below code which doesn't work. I also try from div class, but it also failed. How can I get the number "53" from the first html content?
html.xpath('//a[contains(text(),"下一篇")]/span/text()')
Why it didn't work : the span holding 53 is a sibling (not a child) of the a element.
To complete #super.single430's answer, here's an alternative (if encoding issues occur during the parsing process) :
//span[#class="cur"]/following-sibling::span/text()
html.xpath("//a[contains(text(),'下一篇')]/following-sibling::span/text()")
I've been building a web scraper in BS4 and have gotten stuck. I am using Trip Advisor as a test for other data I will be going after, but am not able to isolate the tag of the 'entire' reviews. Here is an example:
https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html
Notice in the first review, there is an icon below "the wine list is...". I am able to easily isolate the partial reviews, but have not been able to figure out a way to get BS4 to pull the reviews after a simulated 'More' click. I'm trying to figure out what tool(s) are needed for this? Do I need to use selenium instead?
The original element looks like this:
<span class="partnerRvw">
<span class="taLnk hvrIE6 tr475091998 moreLink ulBlueLinks" onclick=" ta.util.cookie.setPIDCookie(4444); ta.call('ta.servlet.Reviews.expandReviews', {type: 'dummy'}, ta.id('review_475091998'), 'review_475091998', '1', 4444);
">
More </span>
<span class="ui_icon caret-down"></span>
</span>
Looking at the HTML after you click on the More link you would find a new dynamically added class that has a with the information I need (see below):
<div class="review dyn_full_review inlineReviewUpdate provider0 first newFlag" style="display: block;">
<a name="UR475091998" class=""></a>
<div id="UR475091998" class="extended provider0 first newFlag">
<div class="col1of2">
<div class="member_info">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-SRC_475091998" class="memberOverlayLink" onmouseover="requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'user_name_photo');" data-anchorwidth="90">
<div class="avatar profile_6875524F623CC948F4F9CA95BB4A9567 ">
<a onclick="">
<img src="https://media-cdn.tripadvisor.com/media/photo-l/0d/97/43/bf/joannecarpenter.jpg" class="avatar potentialFacebookAvatar avatarGUID:6875524F623CC948F4F9CA95BB4A9567" width="74" height="74">
</a>
</div>
<div class="username mo">
<span class="expand_inline scrname mbrName_6875524F623CC948F4F9CA95BB4A9567" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">joannecarpenter</span>
</div>
</div>
<div class="location">
Humble, Texas
</div>
</div>
<div class="memberBadging g10n">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-CONT" class="no_cpu" onclick="ta.util.cookie.setPIDCookie('15984'); requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'review_count');" data-anchorwidth="90">
<div class="levelBadge badge lvl_02">
Level <span><img src="https://static.tacdn.com/img2/badges/20px/lvl_02.png" alt="" class="icon" width="20" height="20/"></span> Contributor </div>
<div class="reviewerBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/rev_03.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 reviews</span> </div>
<div class="contributionReviewBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/Foodie.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 restaurant reviews</span>
</div>
</div>
</div>
</div>
<div class="col2of2">
<div class="innerBubble">
<div class="quote">“<span class="noQuotes">Dinner</span>”</div>
<div class="rating reviewItemInline">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s50" width="70" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="ratingDate relativeDate" title="April 12, 2017">Reviewed 3 days ago
<span class="new redesigned">NEW</span> </span>
<a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24687)">
<span class="ui_icon mobile-phone"></span>
via mobile
</a>
</div>
<div class="entry">
<p>
Our favorite restaurant in Houston. Definitely the best and friendliest service! The food is not only served with a flair, it is absolutely delicious. My favorite is the Lamb. It is the best! Also the duck moose, fois gras, the crispy salad and the French onion soup are all spectacular! This is a must try restaurant! The wine list is fantastic. Just ask Daniel for suggestions. He not only knows his wines; he loves what he does! We Love this place!
</p>
</div>
<div class="rating-list">
<div class="recommend">
<span class="recommend-titleInline noRatings">Visited April 2017</span>
</div>
</div>
<div class="expanded lessLink">
<span class="taLnk collapse ulBlueLinks no_cpu ">
Less
</span>
<span class="textArrow_more ui_icon caret-up"></span>
</div>
<div id="helpfulq475091998_expanded" class="helpful redesigned white_btn_container ">
<span class="isHelpful">Helpful?</span> <div class="tgt_helpfulq475091998 rnd_white_thank_btn" onclick="ta.call('ta.servlet.Reviews.helpfulVoteHandlerOb', event, this, 'LeJIVqd4EVIpECri1GII2t6mbqgqguuuxizSxiniaqgeVtIJpEJCIQQoqnQQeVsSVuqHyo3KUKqHMdkKUdvqHxfqHfGVzCQQoqnQQZiptqH5paHcVQQoqnQQrVxEJtxiGIac6XoXmqoTpcdkoKAUAAv0tEn1dkoKAUAAv0zH1o3KUK0pSM13vkooXdqn3XmffAdvqndqnAfbAo77dbAo3k0npEEeJIV1K0EJIVqiJcpV1U0Ii9VC1rZlU3XozxbZZxE2crHN2TDUJiqnkiuzsVEOxdkXqi7TxXpUgyR2xXvOfROwaqILkrzz9MvzCxMva7xEkq8xXNq8ymxbAq8AzzrhhzCxbx2vdNvEn2fnwEfq8alzCeqi53ZrgnMrHhshTtowGpNSmq89IwiVb7crUJxdevaCnJEqI33qiE5JGErJExXKx5ooItGCy5wnCTx2VA7RvxEsO3'); ta.trackEventOnPage('HELPFUL_VOTE_TEST', 'helpfulvotegiven_v2');">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_white.png" class="helpful_thumbs_up white">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_green.png" class="helpful_thumbs_up green">
<span class="helpful_text">Thank joannecarpenter</span> </div>
</div>
<div class="tooltips vertically_centered">
<div class="reportProblem">
<span id="ReportIAP_475091998" class="problem collapsed taLnk" onclick="ta.trackEventOnPage('Report_IAP', 'Report_Button_Clicked', 'member'); ta.call('ta.servlet.Reviews.iapFlyout', event, this, '475091998')" onmouseover="if (!this.getAttribute('data-first')) {ta.trackEventOnPage('Reviews', 'report_problem', 'hover_over_flag'); this.setAttribute('data-first', 1)} uiOverlay(event, this)" data-tooltip="" data-position="above" data-content="Problem with this review?">
<img src="https://static.tacdn.com/img2/icons/gray_flag.png" width="13" height="14" alt="">
<span class="reportTxt">Report</span> </span>
</div>
</div>
<div class="userLinks">
<div class="sameGeoActivity">
<a href="/members-citypage/joannecarpenter/g56010" target="_blank" onclick="ta.setEvtCookie('Reviews','more_reviews_by_user','',0,this.href); ta.util.cookie.setPIDCookie(19160)">
See all 5 reviews by joannecarpenter for Humble </a>
</div>
<div class="askQuestion">
<span class="taLnk ulBlueLinks" onclick="ta.trackEventOnPage('answers_review','ask_user_intercept_click' ); ta.load('ta-answers', (function() {require('answers/misc').askReviewerIntercept(this, '470148', 'joannecarpenter', '6875524F623CC948F4F9CA95BB4A9567', 'en', '475091998','Chez Nous', 39151)}).bind(this), true);">Ask joannecarpenter about Chez Nous</span>
</div>
</div>
<div class="note">
This review is the subjective opinion of a TripAdvisor member and not of TripAdvisor LLC. </div>
<div class="duplicateReviewsInline">
<div class="previous">joannecarpenter has 1 more review of Chez Nous</div> <ul class="dupReviews">
<li class="dupReviewItem">
<div class="reviewTitle">
“Joanne Carpenter”
</div>
<div class="rating">
<span class="rate sprite-rating_ss rating_ss"> <img class="sprite-rating_ss_fill rating_ss_fill ss50" width="50" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="date">Reviewed January 18, 2017</span>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="large">
</div>
<div class="ad iab_inlineBanner">
<div id="gpt-ad-468x60" class="adInner gptAd"></div>
</div>
</div>
Is there a way for BS4 to handle this for me?
Here's a simple example to get you started:
import selenium
from selenium import webdriver
driver = webdriver.PhantomJS()
url = "https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html"
driver.get(url)
elem = driver.get_element_by_class_name("taLnk")
...
You could find more info about the methods here:
http://selenium-python.readthedocs.io/
In all likelihood you will need to examine a few more of these pages, to identify variations in the HTML code. For the sample you have offered, and given that you are able to obtain it by simulating a press, the following code works to select the paragraph that you seem to want.
from bs4 import BeautifulSoup
HTML = open('temp.htm').read()
soup = BeautifulSoup(HTML, 'lxml')
para = soup.select('.entry > p')
print (para[0].text)
Result:
Our favorite restaurant in Houston. Definitely the best and friendliest service! The food is not only served with a flair, it is absolutely delicious. My favorite is the Lamb. It is the best! Also the duck moose, fois gras, the crispy salad and the French onion soup are all spectacular! This is a must try restaurant! The wine list is fantastic. Just ask Daniel for suggestions. He not only knows his wines; he loves what he does! We Love this place!
Note that there are newlines before and after the paragraph.
Need to exercise buttons ( edit / delete) in following HTML. Unique between all devices are "device-name". Hence locating target device is straight as -
cmd = "//div[#class='device-name' and text()='%s']" % (devicename)
element = brHandle.find_elements_by_xpath(cmd)
HTML -
<div class="device" id="device-1">..</div>
<div class="device" id="device-2">..</div>
<div class="device-form-factor desktop">
<div class="device-platform unknown"></div>
<div class="device-status">
....
</div>
</div>
<div class="device-name">auto-generated</div>
<div class="buttons">
<div class="button edit" href="#">Edit</div>
<div class="button delete" href="#">Delete</div>
</div>
</div>
<div class="device" id="device-3">..</div>
....
To access parent and then button, i tried following- but didn't work out. I was expecting to fetch device-id and then form my xpath to edit/delete button. Please suggest -
parent = element.find_element_by_xpath("..").text()
I am trying to send a message to my FB friends using selenium in python. I've tried Element by Xpath, CSS Selector, Class but it doesn't work please help me how it would be I'm trying using
Message_button = browser.find_element_by_id("u_0_t")
Message_button.click()
time.sleep(10)
message = browser.find_element_by_xpath("//textarea[#class='luiTextareaAutogrow _552m']")
message.send_keys("Hi this is Mark how are you..!")
message.submit()
time.sleep(10)
Here is the FB Code
<div class="fbNubFlyoutFooter">
<div class="_552h _n4k">
<textarea style="height: 15px;" class="uiTextareaAutogrow _552m" data-ft="{"tn":"+N"}" onkeydown="run_with(this, ["legacy:control-textarea"], function() {TextAreaControl.getInstance(this)});"></textarea>
</div>
<div style="right: 0px;" class="_552n">
<div class="_10nr"></div>
<div class="_6a _552o">
<form action="https://upload.facebook.com/ajax/mercury/upload.php" class="_vzk" title="Add Photos" method="post" data-reactid=".12">
<input name="attach_id" data-reactid=".12.0" type="hidden">
<input name="images_only" value="true" data-reactid=".12.1" type="hidden">
<div class="_m _4q60 _3rzn _6a" data-reactid=".12.2"><a tabindex="0" class="_4q61 _5f0v _509v" data-reactid=".12.2.0"><i class="_509w" alt="Camera" data-reactid=".12.2.0.0"></i></a>
<input class="_n" name="attachment[]" multiple="" accept="image/*" title="Add Photos" data-reactid=".12.2.1" type="file">
</div>
</form>
</div>
<div class="uiToggle _1tn3 emoticonsPanel" data-ft="{"tn":"+O"}"><a class="_4vui" tabindex="0" title="Choose a sticker or emoticon" rel="toggle" role="button"><i class="_4l9x"></i></a>
<div class="_5r8f panelFlyout uiToggleFlyout">
<div class="_5r8p">
<div style="padding:30px;text-align:center;" data-reactid=".13"><span class="img _55ym _55yq _55yo _5d9-" aria-label="Loading..." aria-busy="true" data-reactid=".13.0"></span>
</div>
</div>
<div class="panelFlyoutArrow"></div>
</div>
</div>
<div class="lfloat"></div>
<div class="_5g2o"><a tabindex="0" aria-label="Send a Like" class="_3s0d" title="Send a Like" data-reactid=".10"><i class="_2y9i _30yy" data-reactid=".10.0"></i></a>
</div>
</div>
</div>
Please help me am I using the correct method
you can use css_selector and Keys.ENTER as follows:
# enter your text by using css selector
driver.find_element_by_css_selector(".uiTextareaAutogrow._552m").send_keys("Merhaba, nasilsiniz?")
# you can send your message by sending native keyboard ENTER
driver.find_element_by_css_selector(".uiTextareaAutogrow._552m").send_keys(Keys.ENTER)