how to get specific part in python beautifulsoup without using loop - python

my code:
page = requests.get("http://www.freejobalert.com/upsc-recruitment/16960/#Engg-Services2019")
c = page.content
soup=BeautifulSoup(c,"html.parser")
tables=soup.find_all("table",{"style":"width: 500px;"})
Html table:
<table style="width: 500px;" border="2">
<tbody>
<tr>
<td colspan="2">
<p style="text-align: center;"><span style="color: #ff0000;"><strong>Union Public Service Commission (UPSC)</strong></span></p>
<p style="text-align: center;"><span style="color: #ff00ff;"><strong>Advt No.01/2019</strong></span></p>
<p style="text-align: center;"><span style="color: #008000;"><strong><strong><strong>Engineering Services (Prelims) Exam 2019</strong></strong></strong></span></p>
<p style="text-align: center;"><strong>WWW.FREEJOBALERT.COM</strong></p>
</td>
</tr>
<tr>
<td style="text-align: center;" colspan="2"><span style="color: #ff0000;"><strong>Application Fee</strong></span></p>
<ul>
<li style="text-align: left;"><span style="line-height: 19px;">For Female/SC/ST/ PH: <strong>NIL</strong></span></li>
<li style="text-align: left;"><span style="line-height: 19px;">For Others: <strong>Rs. 200/-</strong></span></li>
<li style="text-align: left;">Candidates can pay either by depositing the money in any Branch of SBI by cash or by using net banking facility of SBI.</li>
</ul>
</td>
</tr>
<tr>
<td style="text-align: center;" colspan="2"><span style="color: #ff0000;"><strong><strong>Important Dates</strong></strong></span></p>
<ul>
<li style="text-align: left;">Starting Date to Apply Online: <strong>26-09-2018</strong></li>
<li style="text-align: left;"><span style="line-height: 19px;">Last Date to Apply Online: <strong>22-10-2018 till 06:00 PM</strong></span></li>
<li style="text-align: left;"><span style="line-height: 19px;">Date for Preliminary Exam:<strong> 06-01-2019</strong></span></li>
<li style="text-align: left;"><span style="line-height: 19px;">Last date for Fee Payment (Pay by cash):<strong> 21-10-2018 at 11.59 PM</strong></span></li>
<li style="text-align: left;"><span style="line-height: 19px;">Last date for Fee Payment (online):<strong> 22-10-201</strong></span><strong>8 till 06:00 PM</strong></li>
</ul>
</td>
</tr>
</tbody>
</table>
I am expecting:
[
<td colspan="2">
<p style="text-align: center;"><span style="color: #ff0000;"><strong>Union Public Service Commission (UPSC)</strong></span></p>
<p style="text-align: center;"><span style="color: #ff00ff;"><strong>Advt No.01/2019</strong></span></p>
<p style="text-align: center;"><span style="color: #008000;"><strong><strong><strong>Engineering Services (Prelims) Exam 2019</strong></strong></strong></span></p>
<p style="text-align: center;"><strong>WWW.FREEJOBALERT.COM</strong></p>
</td>
]
how can i get these part of html using beautiful soup without using loop..
i want all with colspan="2"
please have a look into this code..
Thanks....

What prevent you to directly find_all that element?
from bs4 import BeautifulSoup
import requests
page = requests.get("http://www.freejobalert.com/upsc-recruitment/"
"16960/#Engg-Services2019")
c = page.content
soup = BeautifulSoup(c,"html.parser")
tables = soup.find_all("td", attrs={'colspan':'2'})
print(tables)

Related

Selenium find similar links based on conditional tags

I need to find a specific href link below is an example of 3 rows. The rows are very similar but they are a bit different. I need the link to the Product ABC that is MSSQL and CS
<tr>
<th class=\"align-middle\" scope=\"row\">
<span class=\"badge bg-primary position-relative py-2\">Product ABC
<span class=\"position-absolute top-0 start-100 translate-middle badge rounded-pill bg-secondary\">P3
</span>
</span>
</th>
<td class=\"align-middle small\">MySQL</td>
<td class=\"align-middle small\">MR</td>
<td class=\"align-middle small\">
<div class=\"btn-group\" role=\"group\">
<span data-bs-placement=\"left\" data-bs-toggle=\"tooltip\" title=\"\" data-bs-original-title=\"Show Application\" aria-label=\"Show Application\">
<a class=\"btn btn-sm btn-outline-primary\" href=\"/repo/applications/328\">
<svg class=\"bi flex-shrink-0\" height=\"18\" role=\"img\" width=\"18\">
<use href=\"#icon_eye\"></use>
</svg>
</a>
</span>
</div>
</td>
</tr>
<tr>
<th class=\"align-middle\" scope=\"row\">
<span class=\"badge bg-primary position-relative py-2\">Product ABC
<span class=\"position-absolute top-0 start-100 translate-middle badge rounded-pill bg-secondary\">P3
</span>
</span>
</th>
<td class=\"align-middle small\">MySQL</td>
<td class=\"align-middle small\">MR</td>
<td class=\"align-middle small\">
<div class=\"btn-group\" role=\"group\">
<span data-bs-placement=\"left\" data-bs-toggle=\"tooltip\" title=\"\" data-bs-original-title=\"Show Application\" aria-label=\"Show Application\">
<a class=\"btn btn-sm btn-outline-primary\" href=\"/repo/applications/329\">
<svg class=\"bi flex-shrink-0\" height=\"18\" role=\"img\" width=\"18\">
<use href=\"#icon_eye\"></use>
</svg>
</a>
</span>
</div>
</td>
</tr>
<tr>
<th class=\"align-middle\" scope=\"row\">
<span class=\"badge bg-primary position-relative py-2\">Product ABC
<span class=\"position-absolute top-0 start-100 translate-middle badge rounded-pill bg-secondary\">P3
</span>
</span>
</th>
<td class=\"align-middle small\">SQLServer</td>
<td class=\"align-middle small\">CS</td>
<td class=\"align-middle small\">
<div class=\"btn-group\" role=\"group\">
<span data-bs-placement=\"left\" data-bs-toggle=\"tooltip\" title=\"\" data-bs-original-title=\"Show Application\" aria-label=\"Show Application\">
<a class=\"btn btn-sm btn-outline-primary\" href=\"/repo/applications/330\">
<svg class=\"bi flex-shrink-0\" height=\"18\" role=\"img\" width=\"18\">
<use href=\"#icon_eye\"></use>
</svg>
</a>
</span>
</div>
</td>
</tr>
I currently have this
element = driver.find_element(By.XPATH, "//tr[.//span[contains(.,'Product ABC')]]//a")
element.get_attribute("href")
The code above works but is returns the first Product ABC that it sees in some cases that is ok but some times its incorrect. How do i make sure i filter my xpath so I return the href applications/330 and not the others.
In case you want to select the a element containing the desired href link based both on Product ABC value and on SQLServer value the XPath locator will be as following:
element = driver.find_element(By.XPATH, "//tr[.//span[contains(.,'Product ABC')] and .//td[contains(.,'SQLServer')]]//a")
In case you will need to add dependency of CS too, it can be added in the same way here:
element = driver.find_element(By.XPATH, "//tr[.//span[contains(.,'Product ABC')] and .//td[contains(.,'SQLServer')] and .//td[contains(.,'CS')]]//a")
In case you will need to locate the link containing element based on MySQL or/and on MR this can be done in the same manner.

HTML table to database

At this point, my table looks as follows:
<table border="0" cellpadding="0" cellspacing="0" class="ms-formtable" id="formTbl" style="margin-top: 8px;" width="100%">
<tbody>
<tr>
<td class="ms-formlabel" nowrap="true" valign="top" width="165px">
<h3 class="ms-standardheader">
<a name="SPBookmark_FileLeafRef">
</a>
Name
</h3>
</td>
<td class="ms-formbody" id="SPFieldFile" valign="top" width="450px">
<a href="http://google.com" onclick="DispDocItemEx(this, 'FALSE', 'FALSE', 'FALSE', '');">
X
</a>
</td>
</tr>
<tr>
<td class="ms-formlabel" nowrap="true" valign="top" width="165px">
<h3 class="ms-standardheader">
<a name="SPBookmark_Owner">
</a>
Name#
</h3>
</td>
<td class="ms-formbody" id="SPFieldChoice" valign="top" width="450px">
Z
</td>
</tr>
<tr>
<td class="ms-formlabel" nowrap="true" valign="top" width="165px">
<h3 class="ms-standardheader">
<a name="SPBookmark_DirectiveRank">
</a>
Age
</h3>
</td>
<td class="ms-formbody" id="SPFieldChoice" valign="top" width="450px">
52
</td>
</tr>
<tr>
<td class="ms-formlabel" nowrap="true" valign="top" width="165px">
<h3 class="ms-standardheader">
<a name="SPBookmark_Number">
</a>
number
</h3>
</td>
<td class="ms-formbody" id="SPFieldText" valign="top" width="450px">
1
</td>
</tr>
<tr>
<td class="ms-formlabel" nowrap="true" valign="top" width="165px">
<h3 class="ms-standardheader">
<a name="SPBookmark_Title">
</a>
Name of File
</h3>
</td>
<td class="ms-formbody" id="SPFieldText" valign="top" width="450px">
Funny Names
</td>
</tr>
<tr>
<td class="ms-formlabel" nowrap="true" valign="top" width="165px">
<h3 class="ms-standardheader">
<a name="SPBookmark_EffectiveFrom">
</a>
date
</h3>
</td>
<td class="ms-formbody" id="SPFieldDateTime" valign="top" width="450px">
1.1.2022
</td>
</tr>
</tbody>
</table>
I basically need to open an HTML file, filter table with id "formTbl" and then either create JSON with values : {Firsttd:Secondtd, "Name":"Test", "Date":"Blank"} or insert into database where First td (in tr tag we have 2 td, first it name of column and second is value) in table A and second td in table B. Is there any way? I´ve tried using Python, where I got so far json looks like [["","Name","","Test",""],["","Age","","12",""]] and in C# I´ve tried HTMLAgilityPack but it wasn´t working.
Here is the solution with JQuery.
<html>
<body>
<table id="example-table">
<tr>
<th>Name</th>
<th>Name#</th>
<th>Age</th>
<th>Number</th>
<th>Name of file</th>
<th>Date</th>
</tr>
<tr>
<td>X</td>
<td>Z</td>
<td>52</td>
<td>1</td>
<td>Name of file</td>
<td>2021-22-10</td>
</tr>
</table>
<textarea rows="10" cols="50" id="jsonTextArea">
</textarea>
</body>
</html>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/table-to-json#1.0.0/lib/jquery.tabletojson.min.js"></script>
<script type="text/javascript">
var tableToJson = $('#example-table').tableToJSON();
var sendingData = JSON.stringify (tableToJson);
$('#jsonTextArea').val(sendingData);
// Send JSON data to backend
$.post('http://localhost/test.php', {sendingData}, function(data, textStatus, xhr) {
var backendResponse = data;
console.log(backendResponse);
});
</script>

Multiplying functions depending on a previous input

I'm trying to do a little script that writes an html file, this file must have a table which contains two rows that have some information about different products, I managed to get this done, but now I need that this table repeats as many times depending on a previous input data, so for this I thought I could multiply the function which contains the html code but it doesn't work, actually I'm not quite sure what I'm doing here so a little bit of help wouldn't be bad...
This is what I want:
Input ---> How many tables: 3
So the html output file should look something like this
<-- TABLE 1 -->
<table>
<tr>
<td colspan="4" height="30"></td>
</tr>
<tr>
<td width="50" class="width6p"></td>
<td width="260" class="width44p"><img src="http://site/image/CODEPRODUCT_1" width="230" alt="DESCRIPTION_1" style="display:block" border="0" class="width90p"/></td>
<td width="30" class="width3p"></td>
<td width="260" class="width44p"><img src="http://site/image/CODEPRODUCT_2" width="230" alt="DESCRIPTION_2" style="display:block" border="0" class="width90p"/></td>
</tr>
<tr>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">DESCRIPTION_1</span ><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14"> DESCRIPTION_1</span><br/>
<span style="font-size:12px;" class="font12">SKU: CODEPRODUCT_-1</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">PRICE_1</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">DESCRIPTION_2</span><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14"> DESCRIPTION_2</span><br/>
<span style="font-size:12px;" class="font12">SKU: CODEPRODUCT_-2</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">PRICE_2</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
</tr>
<tr>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
</tr>
</table>
<-- TABLE 2 -->
<table>
<tr>
<td colspan="4" height="30"></td>
</tr>
<tr>
<td width="50" class="width6p"></td>
<td width="260" class="width44p"><img src="http://site/image/CODEPRODUCT_1" width="230" alt="DESCRIPTION_1" style="display:block" border="0" class="width90p"/></td>
<td width="30" class="width3p"></td>
<td width="260" class="width44p"><img src="http://site/image/CODEPRODUCT_2" width="230" alt="DESCRIPTION_2" style="display:block" border="0" class="width90p"/></td>
</tr>
<tr>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">DESCRIPTION_1</span ><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14"> DESCRIPTION_1</span><br/>
<span style="font-size:12px;" class="font12">SKU: CODEPRODUCT_-1</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">PRICE_1</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">DESCRIPTION_2</span><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14"> DESCRIPTION_2</span><br/>
<span style="font-size:12px;" class="font12">SKU: CODEPRODUCT_-2</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">PRICE_2</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
</tr>
<tr>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
</tr>
</table>
<-- TABLE 3 -->
<table>
<tr>
<td colspan="4" height="30"></td>
</tr>
<tr>
<td width="50" class="width6p"></td>
<td width="260" class="width44p"><img src="http://site/image/CODEPRODUCT_1" width="230" alt="DESCRIPTION_1" style="display:block" border="0" class="width90p"/></td>
<td width="30" class="width3p"></td>
<td width="260" class="width44p"><img src="http://site/image/CODEPRODUCT_2" width="230" alt="DESCRIPTION_2" style="display:block" border="0" class="width90p"/></td>
</tr>
<tr>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">DESCRIPTION_1</span ><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14"> DESCRIPTION_1</span><br/>
<span style="font-size:12px;" class="font12">SKU: CODEPRODUCT_-1</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">PRICE_1</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">DESCRIPTION_2</span><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14"> DESCRIPTION_2</span><br/>
<span style="font-size:12px;" class="font12">SKU: CODEPRODUCT_-2</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">PRICE_2</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
</tr>
<tr>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
</tr>
</table>
Here is my Python code
import locale
import requests
import urlparse
import json
def html(sku_01,desc_01,sku_precio_1,sku_02,desc_02,sku_precio_2,bloque_prod):
f = open('mkt-output.html','w')
f.write(bloque_prod)
f.close()
if __name__ == '__main__':
sku_01 = raw_input('Ingrese SKU: ')
desc_01 = raw_input('Descripcion de SKU: ')
sku_precio_1 = raw_input('Precio de SKU: ')
sku_02 = raw_input('Ingrese SKU: ')
desc_02 = raw_input('Descripcion de SKU: ')
sku_precio_2 = raw_input('Precio de SKU: ')
bloque_prod = """<table>
<tr>
<td colspan="4" height="30"></td>
</tr>
<tr>
<td width="50" class="width6p"></td>
<td width="260" class="width44p"><img src="http://site/images/{}" width="230" alt="{}" style="display:block" border="0" class="width90p"/></td>
<td width="30" class="width3p"></td>
<td width="260" class="width44p"><img src="http://site/images/{}" width="230" alt="{}" style="display:block" border="0" class="width90p"/></td>
</tr>
<tr>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">{}</span ><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14">{} {}</span><br/>
<span style="font-size:12px;" class="font12">SKU: {}-{}</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">{}</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">{}</span><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14">{} {}</span><br/>
<span style="font-size:12px;" class="font12">SKU: {}-{}</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">{}</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
</tr>
<tr>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/templates/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/templates/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
</tr>
</table>""".format(sku_01,
sku_01,
desc_01,
sku_02,
sku_02,
desc_02,
' '.join(desc_01.split()[0:3]),
' '.join(desc_01.split()[3:-1]),
desc_01.split()[-1],
sku_01[0:-1],
sku_01[-1],
sku_precio_1,
' '.join(desc_02.split()[0:3]),
' '.join(desc_02.split()[3:-1]),
desc_02.split()[-1],
sku_02[0:-1],
sku_02[-1],
sku_precio_2,
sku_01,
sku_02)
html(sku_01, desc_01, sku_precio_1, sku_02, desc_02, sku_precio_2, bloque_prod)
If you need duplicate data written to the HTML file, you could simply have the html() function write variable bloque_prod multiple times by multiplying it:
def html(sku_01,desc_01,sku_precio_1,sku_02,desc_02,sku_precio_2,bloque_prod,tables):
f = open('mkt-output.html','w')
f.write(bloque_prod * tables)
f.close()
Note the addition of the variable tables for the number of table duplicates.
Then, define variable tables in __main()__:
tables = input('Tables: ')
…and add tables into the last line where you call html()
html(sku_01, desc_01, sku_precio_1, sku_02, desc_02, sku_precio_2, bloque_prod,tables)
Are you looking to write different tables?
Also, if it's necessary to annotate which table is currently being printed, you could add an if statement in:
def html(sku_01,desc_01,sku_precio_1,sku_02,desc_02,sku_precio_2,bloque_prod,tables):
f = open('mkt-output.html','w')
rawHTML = ""
for table in range(0, tables):
rawHTML += ("\n<-- TABLE " + str(table) + " -->\n" + bloque_prod)
f.write(rawHTML)
f.close()
(if you wanted the numbering to start at 1, you'd just change str(table) to str(table + 1))
--EDIT-- It seems that you are looking to create tables with different values. I would rewrite the program as such to do this:
import locale
import requests
import urlparse
import json
def createTable(sku_01,desc_01,sku_precio_1,sku_02,desc_02,sku_precio_2):
bloque_prod = """<table>
<tr>
<td colspan="4" height="30"></td>
</tr>
<tr>
<td width="50" class="width6p"></td>
<td width="260" class="width44p"><img src="http://site/images/{}" width="230" alt="{}" style="display:block" border="0" class="width90p"/></td>
<td width="30" class="width3p"></td>
<td width="260" class="width44p"><img src="http://site/images/{}" width="230" alt="{}" style="display:block" border="0" class="width90p"/></td>
</tr>
<tr>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">{}</span ><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14">{} {}</span><br/>
<span style="font-size:12px;" class="font12">SKU: {}-{}</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">{}</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
<td></td>
<td class="font14" valign="top" style=" font-size: 16px; inline-height:0px; font-family:Helvetica, sans-serif; font-weight:lighter; color:#666666; line-height:130%; padding:10px 0px;">
<span style="font-weight: bold; color:#008EAA" class="font14">{}</span><br />
<span style="font-weight: bold; color:#008EAA; font-size:14px;" class="font14">{} {}</span><br/>
<span style="font-size:12px;" class="font12">SKU: {}-{}</span><br />
<span style="font-size:18px;" class="font14">$ </span>
<span style="font-size:24px; line-height:30px;" class="font20">{}</span>
<span style="font-size:12px; text-transform: uppercase;" class="font10"> C/U</span> <br>
</td>
</tr>
<tr>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/templates/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
<td></td>
<td style="inline-height:0px;padding-top:4px;"><img src="http://www.site/templates/images/loquiero_med.png" width="142" title="Ver producto" style="display:block" border="0" class="width115"/></td>
</tr>
</table>""".format(sku_01,
sku_01,
desc_01,
sku_02,
sku_02,
desc_02,
' '.join(desc_01.split()[0:3]),
' '.join(desc_01.split()[3:-1]),
desc_01.split()[-1],
sku_01[0:-1],
sku_01[-1],
sku_precio_1,
' '.join(desc_02.split()[0:3]),
' '.join(desc_02.split()[3:-1]),
desc_02.split()[-1],
sku_02[0:-1],
sku_02[-1],
sku_precio_2,
sku_01,
sku_02)
return bloque_prod
if __name__ == "__main__":
f = open('mkt-output.html','w+') # Open file in w+ mode so we can append to the end
for table in range(0,input("Tables: ")):
print ("--Table "+str(table)+"--")
sku_01 = raw_input('Ingrese SKU: ')
desc_01 = raw_input('Descripcion de SKU: ')
sku_precio_1 = raw_input('Precio de SKU: ')
sku_02 = raw_input('Ingrese SKU: ')
desc_02 = raw_input('Descripcion de SKU: ')
sku_precio_2 = raw_input('Precio de SKU: ')
f.write(createTable(sku_01,desc_01,sku_precio_1,sku_02,desc_02,sku_precio_2))
f.close()
Hope that helps.

How to find a specific tag by text with BeautifulSoup in Python

<table class="person show-interviews interviews-loaded" application="43352812" current-interview-stage-id="373822" candidate_hiring_plan="52607">
<tbody><tr class="basic-info clickable candidate">
<td class="photo-column" href="/people/34284587?application_id=43352812&src=search">
<img class="person-photo" width="40" height="40" alt="Candidate Profile Picture" src="https://gravatar.com/avatar/b6d305a017cc572d47807d9e6812bef1.png?s=40&d=https%3A%2F%2Fcdn.greenhouse.io%2Fassets%2Fsilhouette-7fdf9a27e7e8acd6f7cad72986479543.png">
</td>
<td class="person-info-column" href="/people/34284587?application_id=43352812&src=search">
<p class="name">
Chew Bacca
<img class="email-candidate-icon" title="Email Chew" width="16" modal_path="/people/34284587/email_candidate_modal?application_id=43352812" src="https://cdn.greenhouse.io/assets/icons/email-fd1e71440bb47a93b13bccdbffa4d311.png" alt="Email">
</p>
</td>
<td class="job-info-column" href="/people/34284587?application_id=43352812&src=search">
<p class="job">Consulting Engineer </p>
<div class="status">
<a class="toggle-interviews" href="#">1 interview to schedule for Face to Face</a>
</div>
</td>
<td class="interview-kit-column" nofollow="true">
<div class="interview-kit-wrapper">
<span class="interview-kit-icon"></span><br>
<a modal_path="/people/34284587/applications/43352812/submit_feedback_options" class="submit-feedback-link" href="#">interview kit</a>
</div>
<label class="bulk-checkbox-wrapper">
<input class="bulk-checkbox" type="checkbox">
</label>
</td>
</tr>
<tr class="availability">
<td colspan="3" class="details name">
<div class="header">
<div class="left-col">
<span class="title closed no-expand">Availability</span>
<span class="state">
<div class="dropdown">
<button name="button" type="submit" id="quick_action_304014813" class="link-like-button" data-toggle="dropdown" aria-has-popup="true" aria-expanded="false">Not Requested</button>
<ul class="dropdown-menu" aria-labelledby="quick_action_304014813">
<li data-type="state" data-url="/people/availability/304014813/state" data-state="not_requested" class="dropdown-item" data-current-state="true">Not Requested</li>
<li data-type="state" data-url="/people/availability/304014813/state" data-state="requested" class="dropdown-item">Requested</li>
<li data-type="state" data-url="/people/availability/304014813/state" data-state="received" class="dropdown-item">Received</li>
<li data-type="state" data-url="/people/availability/304014813/state" data-state="confirmation_sent" class="dropdown-item">Confirmation Sent</li>
<li data-type="action" data-url="/people/availability/edit_modal/304014813?force=true" data-action="edit_availability" class="dropdown-item action-item">ENTER AVAILABILITY MANUALLY</li>
<li data-type="action" data-url="/people/availability/cofirm_modal/304014813?force=true" data-action="send_confirmation" class="dropdown-item action-item">SEND INTERVIEW CONFIRMATION</li>
</ul>
</div>
<span class="action-time"></span>
</span>
</div>
<span class="action">
<button name="button" type="submit" class="link-like-button availability-modal-open" modal_path="/people/availability/request_modal/304014813" data-modal-path="/people/availability/request_modal/304014813">Request Availability</button>
</span>
</div>
<div class="body">
<div class="times-container">
<div class="times proposed">
<div class="title">Suggested Times:</div>
<ul>
</ul>
</div>
<div class="times candidate">
<div class="title">
Chew is available at these times:
</div>
Not yet responded <button name="button" type="button" modal_path="/people/availability/edit_modal/304014813" class="link-like-button availability-edit-modal-open">Edit</button>
</div>
</div>
</div>
</td>
<td class="interview-kit-column"></td>
</tr>
<tr class="interview spicy" application_id="43352812" step_id="553192" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/553364/people/34284587?application_id=43352812" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Cultural Fit Interview
</td>
<td class="details">
<div class="wrapper">
<div class="interview-info">
Skipped <span href="/interviews/49710750/unskip" class="unskip-link">Unskip</span>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
<tr class="interview spicy" application_id="43352812" step_id="553193" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/553365/people/34284587?application_id=43352812" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Peer Panel Interview
</td>
<td class="details">
<div class="wrapper">
<div class="interview-info">
Skipped <span href="/interviews/49710751/unskip" class="unskip-link">Unskip</span>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
<tr class="interview spicy" application_id="43352812" step_id="553194" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/553366/people/34284587?application_id=43352812" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Case Study
</td>
<td class="details">
<div class="wrapper">
<div class="interview-info">
Skipped <span href="/interviews/49710752/unskip" class="unskip-link">Unskip</span>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
<tr class="interview spicy" application_id="43352812" step_id="553195" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/553367/people/34284587?application_id=43352812" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Executive Interview
</td>
<td class="details">
<div class="wrapper">
<div class="interview-info">
Skipped <span href="/interviews/49710753/unskip" class="unskip-link">Unskip</span>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
<tr class="interview spicy" application_id="43352812" step_id="4883928" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/4884061/people/34284587?application_id=43352812" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Challenge
</td>
<td class="details schedulable removable" modal_path="/interviews/schedule?application_id=43352812&interview_kit_id=4884061" modal_title="Consulting Engineer (Austin, New York City, Palo Alto)" nofollow="true" title="Schedule Interview">
<div class="wrapper">
<span href="/interviews/49710754/skip" class="x" title="Skip this interview"></span>
<span class="to-be-scheduled-icon"></span>
<div class="interview-info">
Schedule Interview
<div class="integration-buttons">
</div>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
<tr class="interview spicy" application_id="43352812" step_id="4883933" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/4884066/people/34284587?application_id=43352812" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Personality Assessment
</td>
<td class="details">
<div class="wrapper">
<div class="interview-info">
Skipped <span href="/interviews/49710755/unskip" class="unskip-link">Unskip</span>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
</tbody></table>
<table class="person show-interviews interviews-loaded" application="31024648" current-interview-stage-id="373842" candidate_hiring_plan="52610">
<tbody><tr class="basic-info clickable candidate">
<td class="photo-column" href="/people/5879170?application_id=31024648&src=search">
<img class="person-photo" width="30" height="40" alt="Candidate Profile Picture" src="https://prod-heroku.s3.amazonaws.com/people/photos/005/879/170/resized/imgres.jpg?AWSAccessKeyId=AKIAIK36UTOKQ5F2YNMQ&Expires=1495711223&Signature=GuPHCM1nw%2B2tC%2F44rHejCRvnsx0%3D">
</td>
<td class="person-info-column" href="/people/5879170?application_id=31024648&src=search">
<p class="name">
Jessica Alba
<span class="alert" title="Jessica Alba has been in Phone Interview for more than 14 days">Alert</span>
</p>
<p class="title">New York University</p>
</td>
<td class="job-info-column" href="/people/5879170?application_id=31024648&src=search">
<p class="job">Enterprise Account Executive (North America)</p>
<div class="status">
<a class="toggle-interviews" href="#">1 interview to schedule for Phone Interview</a>
</div>
</td>
<td class="interview-kit-column" nofollow="true">
<div class="interview-kit-wrapper">
<span class="interview-kit-icon"></span><br>
<a modal_path="/people/5879170/applications/31024648/submit_feedback_options" class="submit-feedback-link" href="#">interview kit</a>
</div>
<label class="bulk-checkbox-wrapper">
<input class="bulk-checkbox" type="checkbox">
</label>
</td>
</tr>
<tr class="availability">
<td colspan="3" class="details name">
<div class="header">
<div class="left-col">
<span class="title closed no-expand">Availability</span>
<span class="state">
<div class="dropdown">
<button name="button" type="submit" id="quick_action_210624650" class="link-like-button" data-toggle="dropdown" aria-has-popup="true" aria-expanded="false">Not Requested</button>
<ul class="dropdown-menu" aria-labelledby="quick_action_210624650">
<li data-type="state" data-url="/people/availability/210624650/state" data-state="not_requested" class="dropdown-item" data-current-state="true">Not Requested</li>
<li data-type="state" data-url="/people/availability/210624650/state" data-state="requested" class="dropdown-item">Requested</li>
<li data-type="state" data-url="/people/availability/210624650/state" data-state="received" class="dropdown-item">Received</li>
<li data-type="state" data-url="/people/availability/210624650/state" data-state="confirmation_sent" class="dropdown-item">Confirmation Sent</li>
<li data-type="action" data-url="/people/availability/edit_modal/210624650?force=true" data-action="edit_availability" class="dropdown-item action-item">ENTER AVAILABILITY MANUALLY</li>
<li data-type="action" data-url="/people/availability/cofirm_modal/210624650?force=true" data-action="send_confirmation" class="dropdown-item action-item">SEND INTERVIEW CONFIRMATION</li>
</ul>
</div>
<span class="action-time"></span>
</span>
</div>
<span class="action">
<button name="button" type="submit" class="link-like-button availability-modal-open" modal_path="/people/availability/request_modal/210624650" data-modal-path="/people/availability/request_modal/210624650">Request Availability</button>
</span>
</div>
<div class="body">
<div class="times-container">
<div class="times proposed">
<div class="title">Suggested Times:</div>
<ul>
</ul>
</div>
<div class="times candidate">
<div class="title">
Jessica is available at these times:
</div>
Not yet responded <button name="button" type="button" modal_path="/people/availability/edit_modal/210624650" class="link-like-button availability-edit-modal-open">Edit</button>
</div>
</div>
</div>
</td>
<td class="interview-kit-column"></td>
</tr>
<tr class="interview spicy" application_id="31024648" step_id="553218" stage_id="" style="">
<td colspan="2" rowspan="1" class="name" href="/guides/553390/people/5879170?application_id=31024648" title="View Interview Kit">
<span class="interview-kit-icon small"></span>Technical Phone Interview
</td>
<td class="details schedulable removable" modal_path="/interviews/schedule?application_id=31024648&interview_kit_id=553390" modal_title="Enterprise Account Executive (North America)" nofollow="true" title="Schedule Interview">
<div class="wrapper">
<span href="/interviews/23067896/skip" class="x" title="Skip this interview"></span>
<span class="to-be-scheduled-icon"></span>
<div class="interview-info">
Schedule Interview
<div class="integration-buttons">
</div>
</div>
</div>
</td>
<td class="interview-kit-column">
</td>
</tr>
</tbody></table>
There are multiple table classes(person show-interviews interviews-loaded). I want to extract class from class where text mathes or contains Challenge. I want to ignore other classes. This is what I have tried so far :
with open('Page_Source.html') as page_source:
soup=BeautifulSoup(page_source,'html.parser')
for table in soup.findAll('table',{'class':'person show-interviews interviews-loaded'}):
name=table.find('p',{'class':'name'}).find('a').text
#print name
#print table['application']
#print table['current-interview-stage-id']
job_title=table.find('p',{'class':'job'}).text
#print job_title
next_interview_details=table.find('a',{'class':'toggle-interviews'}).text
#print next_interview_details
for tr in table.findAll('tr',{'class':'interview spicy'}):
i=tr.find('td',text='Challenge')
print i
You can filter the desired table(s) by applying a filtering function where you check for Challenge substring to be present in the table's "text":
for table in soup.find_all(lambda tag: tag.name == 'table' and 'Challenge' in tag.get_text()):
print(table.get('class'))
Prints:
['person', 'show-interviews', 'interviews-loaded']
Ask BeautifulSoup to give you the list of tables. Then look at each table, asking whether it contains 'Challenge'. If it does then display the class attribute for that table.
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open('temp.htm').read(),'lxml')
>>> tables = soup.findAll('table')
>>> for table in tables:
... if 'Challenge' in table.text:
... table.attrs['class']
...
['person', 'show-interviews', 'interviews-loaded']
EDIT: Response to comment. I haven't written the code as a filter this time because I wanted to make the logic more apparent.
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open('temp.htm').read(),'lxml')
>>> tables = soup.findAll('table')
>>> for table in tables:
... '----->', table.attrs['class']
... target_tds = [_.parent for _ in table.findAll('span', attrs={'class': 'interview-kit-icon small'})]
... for target_td in target_tds:
... target_td.text.strip(), 'Skipped' in target_td.fetchNextSiblings()[0].text
...
('----->', ['person', 'show-interviews', 'interviews-loaded'])
('Cultural Fit Interview', True)
('Peer Panel Interview', True)
('Case Study', True)
('Executive Interview', True)
('Challenge', False)
('Personality Assessment', True)
('----->', ['person', 'show-interviews', 'interviews-loaded'])
('Technical Phone Interview', False)

How to set html file converted from html to PDF using weasyprint to 100% of page width and height

Code I'm using for PDF generation:
html = HTML(string=final_html, base_url=request.build_absolute_uri())
main_doc = html.render()
pdf = main_doc.write_pdf()
This is the content of final_html string:
<body style="width:100%; height:100%;">
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
font-family: 'Open Sans', sans-serif;
font-size: 14px;
}
table {
margin-top: 0px;
}
th, td {
padding: 5px;
}
.bottom {
vertical-align: bottom;
}
tr.noBorder td {
border: 0;
}
</style> <table style="width:100%; height:100%;">
<tr>
<td COLSPAN="2" style="border-right-style: hidden;">
<div style="float: left; display:inline;">
<div>
<div><strong>-seller_name-</strong></div>
</div>
</div>
</td>
<td COLSPAN=2>
<div style="float: right; display:inline;">
<div style="text-align: center">
<strong>-label_name-</strong>
</div>
<div>
-crnbarcodeimage-
</div>
<div style="text-align: center">
<strong>*-label_number-*</strong>
</div>
</div>
</td>
</tr>
<tr>
<td COLSPAN=2>Name & Delivery Address</td>
<td style="border-right-style: hidden;">Payment Mode</td>
<td style="float: right; border-left-style: hidden; border-top-style: hidden; border-bottom-style: hidden;">
<strong>-order_type-</strong></td>
</tr>
<tr>
<td COLSPAN=2>
<div><strong>-drop_name-</strong></div>
<br>
<div>-drop_address-</div>
<br>
<div>-drop_state- <strong>-drop_pincode-</strong></div>
<br>
<div><strong>Contact Number: -drop_phone-</strong></div>
</td>
<td valign="top" COLSPAN=2>
<div style="float: left;">
<strong>Order No.:</strong>
</div>
<div style="float: right;">
-seller_order_id-
</div>
<div>
<div>
-seller_order_id_barcode-
</div>
</div>
<div style="float: left;">
<strong>Invoice No.</strong>
</div>
<div style="float: right;">
-invoice_number-
</div>
</td>
</tr>
<tr>
<td COLSPAN=4 ALIGN=RIGHT>
</td>
</tr>
<tr>
<td>Description</td>
<td>QTY</td>
<td>Rate</td>
<td>Amount</td>
</tr>
<tr>
<td>-item-</td>
<td>1</td>
<td>-invoice_value-</td>
<td>-invoice_value-</td>
</tr>
<tr>
<td COLSPAN=3 ALIGN=LEFT style="border-right-style:hidden;">Total</td>
<td COLSPAN=1 ALIGN=LEFT style="border-left-style:hidden;">-invoice_value-</td>
</tr>
<tr>
<td COLSPAN=3 ALIGN=LEFT style="border-right-style:hidden;"><strong>COD Amount</strong></td>
<td COLSPAN=1 ALIGN=LEFT style="border-left-style:hidden;"><strong>-cod_value-</strong></td>
</tr>
<tr>
<td COLSPAN=4>
Prices are inclusive of all applicable taxes
</td>
</tr>
<tr>
<td COLSPAN=4 style="border-bottom-style:hidden;">If Undelivered, please return to:</td>
</tr>
<tr>
<td COLSPAN=4>
<strong>
<div>B-220/2, 1st Floor, Right Door, Savitri Nagar, New Delhi: 110017 Ph. 8376035546</div>
</strong>
</td>
</tr> </table> </body> </html>
The pdf is always generated as partial of the page while I want it to cover entire pdf page.
I think that you might have to add this to your styles to configure the page:
#page {
size: 11cm 14.1cm;
margin-left: 0.5cm;
margin-top: 0.5cm;
}

Categories