The HTML is located below, If the span value is less than 20%, then I want to remove the span child up until the <div class="action"> parent only.
So for example:
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 5% </span>
</div>
</div>
</div>
</div>
From the above HTML, these code should only be removed:
<div class="action">
<div class="content">
<span class="content-name"> 5% </span>
</div>
</div>
So what should left is:
<div class="item">
<div class="info">
</div>
</div>
This is my current python code:
items = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[#class='content-name']")))
for item in items:
percentage_text = re.findall("\d+", item.text)[0]
if int(percentage_text) <= 20:
driver.execute_script("arguments[0].remove();", item)
But it only removes the span class and not its parent.
Here is the full HTML, I think it needs javascript to remove elements but I am very new on javascript I researched for more than 2 hours and I still can't find solutions. Thank you very much.
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 5% </span>
</div>
</div>
</div>
</div>
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 95% </span>
</div>
</div>
</div>
</div>
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 32% </span>
</div>
</div>
</div>
</div>
<div class="item">
<div class="info">
<div class="action">
<div class="content">
<span class="content-name"> 15% </span>
</div>
</div>
</div>
</div>
get to the parent of the parent:
driver.execute_script("arguments[0].parentElement.parentElement.remove();", item)
Related
Im struggling with scraping a few pages ... it happens when the structure of the page implies a lot of nested divs...
Here is the code page:
<div>
<section class="ui-accordion-header ui-state-default ui-corner-all ui-accordion-icons" role="tab" id="ui-id-1" aria-controls="ui-id-2" aria-selected="false" aria-expanded="false" tabindex="0"><span class="ui-accordion-header-icon ui-icon ui-icon-triangle-1-e"></span>
<div class="detail-avocat">
<div class="nom-avocat">Me <span class="avocat_name">NAME </span></div>
<div class="type-avocat">Avocat postulant au Tribunal Judiciaire</div>
</div>
<div class="more-info">Plus d'informations</div>
</section>
<div class="ui-accordion-content ui-helper-reset ui-widget-content ui-corner-bottom" style="display: none;" id="ui-id-2" aria-labelledby="ui-id-1" role="tabpanel" aria-hidden="true">
<div class="details">
<div class="detail-avocat-row ">
<div class="detail-avocat-content overflow-h">
<span>Structure :</span>
<div>
<p>Cabinet individuel NAME</p>
</div>
</div>
</div>
<div class="detail-avocat-row ">
<div class="detail-avocat-content overflow-h">
<span>Adresse :</span>
<div>
<p>21 rue Belle Isle 57000 VILLE</p>
</div>
</div>
</div>
<div class="detail-avocat-row ">
<div class="detail-avocat-content overflow-h">
<span>Mail :</span>
<div>
<p>cabinet#mail.fr</p>
</div>
</div>
</div>
<div class="detail-avocat-row">
<div class="detail-avocat-content overflow-h">
<span>Tél :</span>
<div>
<p>Telnum</p>
</div>
</div>
</div>
<div class="detail-avocat-row">
<div class="detail-avocat-content overflow-h">
<span>Fax :</span>
<div>
<p> </p>
</div>
</div>
</div>
<div class="contact-avocat"> Contacter </div>
</div>
</div>
</div>
And here is my python code:
divtel = self.driver.find_elements(by=By.XPATH,
value=f'//div[#class="detail-avocat-content overflow-h"]/div/p')#div[#class="detail-avocat-content overflow-h"]')
for p in divtel:
print(p.text)
It doesnt print anything...with other similar pages it prints the text but in this case it doesnt altough there is text in the nested span and div/p . Do you know why?
How can i resolve my problem please?
thank you
The method .text works only when the webelement containing the text is visible in the webpage. If otherwise the webelement is hidden, you have to use .get_attribute('innerText') or .get_attribute('textContent') or .get_attribute('innerHTML') (see here for difference between them). So for example change
print(p.text)
to
print(p.get_attribute('innerText'))
Im trying to scrape this page with Beautifulsoup.
https://www.nb.co.za/en/view-book/?id=9780798182539
How do I target specific elements if they don't have unique class or id?
Is it possible to scrape a div based on the value/text in the sibling div?
For instance, in the code below, how can I get 9780798182539 if the sibling div is <p>ISBN:</p>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>ISBN:</p>
</div>
<div class="col-md-9 noPadding">
9780798182539
</div>
</div>
Here is the complete html:
<div class="col-lg-7 col-md-12 col-sm-12 author-details">
<h2>Step by Step: Counting to 50 </h2>
<h5>
Cuberdon
</h5>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>ISBN:</p>
</div>
<div class="col-md-9 noPadding">
9780798182539
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Publisher:</p>
</div>
<div class="col-md-9 noPadding">
Human & Rousseau
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Date Released:</p>
</div>
<div class="col-md-9 noPadding">
November 2021
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Price (incl. VAT):</p>
</div>
<div class="col-md-9 noPadding">
R 120.00
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Format:</p>
</div>
<div class="col-md-9 noPadding">
<p>Hard cover, 32pp</p>
</div>
</div>
</div>
You can use :-soup-contains to target the p tag by its text. Wrap around the :has pseudo-class selector, and specify the relationship as direct parent child with a child > combinator, to get the immediate parent div. Then throw in an adjacent sibling combinator +, with div type selector, to move to the adjacent, div:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://www.nb.co.za/nb/view-book?id=9780798182539')
soup = bs(r.content, 'lxml')
print(soup.select_one('div:has(> p:-soup-contains("ISBN:")) + div' ).text.strip())
Here is the working solution, so far.
from bs4 import BeautifulSoup
html = '''
<div class="col-lg-7 col-md-12 col-sm-12 author-details">
<h2>Step by Step: Counting to 50 </h2>
<h5>
Cuberdon
</h5>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>ISBN:</p>
</div>
<div class="col-md-9 noPadding">
9780798182539
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Publisher:</p>
</div>
<div class="col-md-9 noPadding">
Human & Rousseau
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Date Released:</p>
</div>
<div class="col-md-9 noPadding">
November 2021
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Price (incl. VAT):</p>
</div>
<div class="col-md-9 noPadding">
R 120.00
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Format:</p>
</div>
<div class="col-md-9 noPadding">
<p>Hard cover, 32pp</p>
</div>
</div>
</div>
'''
soup = BeautifulSoup(html, "html.parser")
div_text =soup.find('div',class_="col-md-9 noPadding")
print(div_text.get_text(strip=True))
Output:
9780798182539
You could do a find_all on the main divs with class row clearfix, then filter on the divs that contain the string ISBN, and do a find on that div for the div with class col-md-9 noPadding. It would like this in list comprehension:
[i.find('div', class_='col-md-9 noPadding').get_text().strip() for i in soup.find_all('div', class_='row clearfix') if 'ISBN:' in i.get_text()][0]
Output:
9780798182539
I am creating the dashboard view for my CRM. However while displaying the card view, only two of the three card views are visible. Can anyone help me regarding this? Is this a code formatting issue?
I am adding an image of the dashboard for my CRM below as well as the code for the cards given below.
example.html:
{% extends 'base.html' %}
{% block content %}
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" />
<!-- Begin Page Content -->
<div class="container-fluid">
<!-- Page Heading -->
<div class="d-sm-flex align-items-center justify-content-between mb-4 mt-4">
<h1 class="h3 mb-0 text-gray-800">Welcome to NexCRM</h1>
<i class="fas fa-download fa-sm text-white-50"></i> Generate Report
</div>
<!-- Main Content Here -->
<div class="row">
<!-- Company Card Example -->
<div class="col-xl-3 col-md-6 mb-4">
<div class="card border-left-primary shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
<div class="col mr-2">
<div class="text-xs font-weight-bold text-primary text-uppercase mb-1">Companies</div>
<div class="h5 mb-0 font-weight-bold text-gray-800">4,083</div>
</div>
<div class="col-auto">
<i class="fas fa-building fa-2x text-gray-300"></i>
</div>
</div>
</div>
</div>
</div>
<!-- Company Card Example -->
<div class="col-xl-3 col-md-6 mb-4">
<div class="card border-left-primary shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
<div class="col mr-2">
<div class="text-xs font-weight-bold text-primary text-uppercase mb-1">Companies</div>
<div class="h5 mb-0 font-weight-bold text-gray-800">4,083</div>
</div>
<div class="col-auto">
<i class="fas fa-building fa-2x text-gray-300"></i>
</div>
</div>
</div>
</div>
</div>
<!-- Contact Card Example -->
<div class="col-xl-3 col-md-6 mb-4">
<div class="card border-left-warning shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
<div class="col mr-2">
<div class="text-xs font-weight-bold text-warning text-uppercase mb-1">Contacts</div>
<div class="h5 mb-0 font-weight-bold text-gray-800">18,002</div>
</div>
<div class="col-auto">
<i class="fas fa-comments fa-2x text-gray-300"></i>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- Content Row -->
<div class="row">
<!-- Area Chart -->
<div class="col-xl-8 col-lg-7">
<div class="card shadow mb-4">
<!-- Card Header - Dropdown -->
<div class="card-header py-3 d-flex flex-row align-items-center justify-content-between">
<h6 class="m-0 font-weight-bold text-primary">Earnings Overview</h6>
<div class="dropdown no-arrow">
<a class="dropdown-toggle" href="#" role="button" id="dropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
<i class="fas fa-ellipsis-v fa-sm fa-fw text-gray-400"></i>
</a>
<div class="dropdown-menu dropdown-menu-right shadow animated--fade-in" aria-labelledby="dropdownMenuLink">
<div class="dropdown-header">Dropdown Header:</div>
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</div>
</div>
<!-- Card Body -->
<div class="card-body">
<div class="chart-area">
<canvas id="myAreaChart"></canvas>
</div>
</div>
</div>
</div>
<!-- Pie Chart -->
<div class="col-xl-4 col-lg-5">
<div class="card shadow mb-4">
<!-- Card Header - Dropdown -->
<div class="card-header py-3 d-flex flex-row align-items-center justify-content-between">
<h6 class="m-0 font-weight-bold text-primary">Revenue Sources</h6>
<div class="dropdown no-arrow">
<a class="dropdown-toggle" href="#" role="button" id="dropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
<i class="fas fa-ellipsis-v fa-sm fa-fw text-gray-400"></i>
</a>
<div class="dropdown-menu dropdown-menu-right shadow animated--fade-in" aria-labelledby="dropdownMenuLink">
<div class="dropdown-header">Dropdown Header:</div>
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</div>
</div>
<!-- Card Body -->
<div class="card-body">
<div class="chart-pie pt-4 pb-2">
<canvas id="myPieChart"></canvas>
</div>
<div class="mt-4 text-center small">
<span class="mr-2">
<i class="fas fa-circle text-primary"></i> Direct
</span>
<span class="mr-2">
<i class="fas fa-circle text-success"></i> Social
</span>
<span class="mr-2">
<i class="fas fa-circle text-info"></i> Referral
</span>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- /.container-fluid -->
{% endblock content %}
I am trying to download a table from html which is not in the usual td/ tr format and includes images.
The html code looks like this:
<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<div class="improve_listing_btn ui_button primary small">improve this entry</div>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>
I would like to get the table:
[Location 45 out of fifty points,
Service 45 out of fifty points].
The following code only prints "Location" and "Service" and does not include the rating.
for url in urls:
r=requests.get(url)
time.sleep(delayTime)
soup=BeautifulSoup(r.content, "lxml")
data17= soup.findAll('div', {'class' :'dynamicBottom'})
for item in (data17):
print(item.text)
And the code
data18= soup.find(attrs={'class': 'sprite-rating_s_fill rating_s_fill s45'})
print(data18["alt"] if data18 else "No meta title given")
does not help either since it is not clear which rating it represents since it only prints out "45 out of fifty points" but it is not clear for which category. Additionally, the image tag ('sprite-rating_s_fill rating_s_fill s45') varies in other tables depending on the rating.
Is there a way to extract the full table?
Or to tell Python to extract the image after a certain word, e.g. "Location"?
Thank you very much for your help!
html = '''<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<div class="improve_listing_btn ui_button primary small">improve this entry</div>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="ratingRow wrap"):
text = div.text.strip()
alt = div.find('img').get('alt')
print(text, alt)
out:
Location 45 out of fifty points
Service 45 out of fifty points
I tried the following to identify elements but I am getting "No element found" message when I run my scripts.
Method1 tried:
self.driver.find_element_by_xpath("//button[text()='Adopt and Initial']").click()
Method2 tried:
self.driver.find_element_by_css_selector(".btn-primary.btn.left.item-alt").click()
HTML of the button:
Updated Html code for the element. This is for docusign.
<div class="dialog is-signature-mode" tabindex="0">
<header class="dialog-header">
<h1 class="dialog-title">
<span class="item-alt" data-group="tagType" data-group-item="signature">Adopt Your Signature</span>
<span class="item-alt" data-group="tagType" data-group-item="initials" data-selected="">Adopt Your Initials</span>
</h1>
<nav class="icons">
<a class="close" data-action="cancelAdoptSignature">
<i class="icon-close"></i>
</a>
</nav>
</header>
<section class="dialog-body">
<article id="adopt">
<header class="ds-title p">
Confirm your name, initials, and signature.
</header>
<div class="full-name">
<div class="wrapper">
<label for="full-name">Full Name</label> <span class="error hidden">Name required</span>
<br>
<div class="text-input-wrapper">
<input id="full-name" disabled="" value="QAAuto 01Dec2014_10.41.03" name="fullname" type="text" class="required text-input" maxlength="50">
</div>
</div>
</div>
<div class="initials">
<div class="wrapper">
<label for="initials">Initials</label> <span class="error hidden">Initials required</span>
<br>
<div class="text-input-wrapper">
<input id="initials" disabled="" value="Q0" name="initials" type="text" class="required text-input" maxlength="50">
</div>
</div>
</div>
<div class="clear-float"></div>
</article>
<header class="tab-nav">
<ul>
<li>Select Style</li>
<li>Draw</li>
</ul>
</header>
<article id="select-style" class="tab-panel panel-select-style selected">
<h4 class="normal">Preview <span class="error"></span></h4>
<div class="signature-preview">
<div class="signature"><img alt="" src="https://demo.docusign.net/Signing/image.aspx?ti=56b2faad38e7427a99defd1dfaa258ce&insession=1&i=asig150&force=154&s=QAAuto+01Dec2014_10.41.03&f=7_DocuSign&nochrome=0" height="75px" class="signature-img left">
<img alt="" src="https://demo.docusign.net/Signing/image.aspx?ti=56b2faad38e7427a99defd1dfaa258ce&insession=1&i=ainit150&force=155&s=Q0&n=QAAuto+01Dec2014_10.41.03&f=7_DocuSign&nochrome=0" height="75px" class="initials-img right">
<div class="clear-float"></div></div>
<a class="change-style">
Change Style
</a>
<div class="clear-float"></div>
</div>
</article>
<article id="draw" class="tab-panel panel-draw">
<h4 class="normal">
<span class="item-alt-inline" data-group="tagType" data-group-item="signature">Draw your signature</span>
<span class="item-alt-inline" data-group="tagType" data-group-item="initials" data-selected="">Draw your initials</span>
<span class="error"></span>
</h4>
<a class="clear" data-ds="clear">Clear</a>
<div class="signature-draw signature">
<div class="canvas-wrapper"><canvas class="canvas" width="0" height="0"></canvas><canvas class="canvas" width="0" height="0"></canvas></div></div>
</article>
<p class="legalese">By clicking Adopt and Sign, I agree that the signature and initials will be the electronic representation of my signature and initials for all purposes when I (or my agent) use them on documents, including legally binding contracts - just the same as a pen-and-paper signature or initial.</p>
<hr>
<button class="btn-primary btn left item-alt" data-group="tagType" data-group-item="signature" type="button" data-ds="submit" value="initials">Adopt and Sign</button>
<button class="btn-primary btn left item-alt" data-group="tagType" data-group-item="initials" type="button" data-ds="submit" value="initials" data-selected="">Adopt and Initial</button>
<button class="close left btn btn-default" type="button" data-action="cancelAdoptSignature">Cancel</button>
<div class="clear-float"></div>
<div class="styles"></div></section>
</div>
Please try below xpath and let me know what happens
assertTrue(driver.getPageSource().contains("Adopt and Initial"));
self.driver.find_element_by_xpath("//button[conatins(text(),'Adopt and Initial')]".click();
Now Make sure you use assertion before the command because if the assertion fails then the element is not present in the current frame. so we will have to switch the correct frame before we perform the action.
Let me know if you try this.