Below html is parsed from Yahoo Finance:
<TABLE class="yfnc_tabledata1" width="100%" cellpadding="0" cellspacing="0" border="0">
<TR>
<TD>
<TABLE width="100%" cellpadding="2" cellspacing="0" border="0">
<TR class="yfnc_modtitle1" style="border-top:none;">
<td colspan="2" style="border-top:2px solid #000;">
<small>
<span class="yfi-module-title">Period Ending</span>
</small>
</td>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2014</th>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2013</th>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2012</th>
</TR>
<tr>
<td colspan="2">
<strong>
Total Revenue
</strong>
</td>
<td align="right">
<strong>
4,479,648
</strong>
</td>
<td align="right">
<strong>
3,777,068
</strong>
</td>
<td align="right">
<strong>
3,209,782
</strong>
</td>
</tr>
<tr>
<td colspan="2">Cost of Revenue</td>
<td align="right">3,160,470 </td>
<td align="right">2,656,189 </td>
<td align="right">2,284,485 </td>
</tr>
</TABLE>
</TD>
</TR>
</TABLE>
I would like to select all html underneath this part:
<TABLE class="yfnc_tabledata1" width="100%" cellpadding="0" cellspacing="0" border="0">
<TR>
<TD>
<TABLE width="100%" cellpadding="2" cellspacing="0" border="0">
<TR class="yfnc_modtitle1" style="border-top:none;">
<td colspan="2" style="border-top:2px solid #000;">
<small>
<span class="yfi-module-title">Period Ending</span>
</small>
</td>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2014</th>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2013</th>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2012</th>
</TR>
How do I achieve this using BeautifulSoup: select() method? or any other alternative ways?
PS:
Please point me to the doc if you may.
Related
I am trying to fill a form online using selenium and at some point I have to fill a date. I can't use send_keys() since it is not allowed by the page. Instead, when I click on the date field, it pops up a datepicker window that prompts to select the year, and I can do this successfully.
After picking the year, the previous window is removed and a new one that prompts to select the month is displayed. This is done by setting the style from display: none to display: block and to the previous year window the style is set from display: block to display: none.
The problem is that even if the new window is_displayed() and is_enabled() methods return True, the elements of the second window, when using is_displayed() on them returns False, even if the is_enabled() method returns True.
I think that I should refresh the dom elements of my driver, but driver.refresh() puts me back in step 0, where I have to pick the year again.
This is my code:
# Code for selecting year (Works)
dateWindow = driver.find_element_by_xpath('/html/body/div[9]/div[3]/table')
rows = dateWindow.find_elements_by_tag_name("tr")
rows[1].find_element_by_xpath('//span[text()="%s"]' % str_year).click()
# Code for selecting month (Does not work)
dateWindow = driver.find_element_by_xpath('/html/body/div[9]/div[2]/table')
rows = dateWindow.find_elements_by_tag_name("tr")
rows[1].find_element_by_xpath('//span[text()="%s"]' % str_month).click()
In the last line, I get this error:
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
This is the html of the page before selecting the year:
<div class="datepicker-days" style="display: none;">
<table class=" table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: visible;">«</th>
<th colspan="5" class="datepicker-switch">June 1993</th>
<th class="next" style="visibility: visible;">»</th>
</tr>
<tr>
<th class="dow">Su</th>
<th class="dow">Mo</th>
<th class="dow">Tu</th>
<th class="dow">We</th>
<th class="dow">Th</th>
<th class="dow">Fr</th>
<th class="dow">Sa</th>
</tr>
</thead>
<tbody>
<tr>
<td class="old day">30</td>
<td class="old day">31</td>
<td class="day">1</td>
<td class="day">2</td>
<td class="day">3</td>
<td class="day">4</td>
...
<td class="day">29</td>
<td class="day">30</td>
<td class="new day">1</td>
<td class="new day">2</td>
<td class="new day">3</td>
</tr>
<tr>
<td class="new day">4</td>
<td class="new day">5</td>
<td class="new day">6</td>
<td class="new day">7</td>
<td class="new day">8</td>
<td class="new day">9</td>
<td class="new day">10</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
<div class="datepicker-months" style="display: none;">
<table class="table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: visible;">«</th>
<th colspan="5" class="datepicker-switch">1993</th>
<th class="next" style="visibility: visible;">»</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7">
<span class="month">Jan</span>
<span class="month">Feb</span>
<span class="month">Mar</span>
<span class="month">Apr</span>
<span class="month">May</span>
<span class="month">Jun</span>
<span class="month">Jul</span>
<span class="month">Aug</span>
<span class="month">Sep</span>
<span class="month">Oct</span>
<span class="month">Nov</span>
<span class="month">Dec</span>
</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
<div class="datepicker-years" style="display: block;">
<table class="table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: visible;">«</th>
<th colspan="5" class="datepicker-switch">1990-1999</th>
<th class="next" style="visibility: visible;">»</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7">
<span class="year old">1989</span>
<span class="year">1990</span>
<span class="year">1991</span>
<span class="year">1992</span>
<span class="year">1993</span>
<span class="year active">1994</span>
<span class="year">1995</span>
<span class="year">1996</span>
<span class="year">1997</span>
<span class="year">1998</span>
<span class="year">1999</span>
<span class="year new">2000</span>
</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
This is the html of the page before selecting the month and after selecting the year:
<div class="datepicker-days" style="display: none;">
<table class=" table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: visible;">«</th>
<th colspan="5" class="datepicker-switch">June 1993</th>
<th class="next" style="visibility: visible;">»</th>
</tr>
<tr>
<th class="dow">Su</th>
<th class="dow">Mo</th>
<th class="dow">Tu</th>
<th class="dow">We</th>
<th class="dow">Th</th>
<th class="dow">Fr</th>
<th class="dow">Sa</th>
</tr>
</thead>
<tbody>
<tr>
<td class="old day">30</td>
<td class="old day">31</td>
<td class="day">1</td>
<td class="day">2</td>
<td class="day">3</td>
<td class="day">4</td>
...
<td class="day">29</td>
<td class="day">30</td>
<td class="new day">1</td>
<td class="new day">2</td>
<td class="new day">3</td>
</tr>
<tr>
<td class="new day">4</td>
<td class="new day">5</td>
<td class="new day">6</td>
<td class="new day">7</td>
<td class="new day">8</td>
<td class="new day">9</td>
<td class="new day">10</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
<div class="datepicker-months" style="display: block;">
<table class="table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: visible;">«</th>
<th colspan="5" class="datepicker-switch">1993</th>
<th class="next" style="visibility: visible;">»</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7">
<span class="month">Jan</span>
<span class="month">Feb</span>
<span class="month">Mar</span>
<span class="month">Apr</span>
<span class="month">May</span>
<span class="month">Jun</span>
<span class="month">Jul</span>
<span class="month">Aug</span>
<span class="month">Sep</span>
<span class="month">Oct</span>
<span class="month">Nov</span>
<span class="month">Dec</span>
</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
<div class="datepicker-years" style="display: none;">
<table class="table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: visible;">«</th>
<th colspan="5" class="datepicker-switch">1990-1999</th>
<th class="next" style="visibility: visible;">»</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7">
<span class="year old">1989</span>
<span class="year">1990</span>
<span class="year">1991</span>
<span class="year">1992</span>
<span class="year">1993</span>
<span class="year active">1994</span>
<span class="year">1995</span>
<span class="year">1996</span>
<span class="year">1997</span>
<span class="year">1998</span>
<span class="year">1999</span>
<span class="year new">2000</span>
</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
Any ideas? Thanks in advance
The desired element is an dynamic element so while selecting the Month you have to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using XPATH:
dateWindow = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[9]/div[2]/table")))
rows = dateWindow.find_elements_by_tag_name("tr")
rows[1].find_element_by_xpath('//span[text()="%s"]' % str_month).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
i want to change the color of the borders of my table, i'm using the class .table-bordered the color of borders is grey/silver and i want to change it to black without using css, i used style="border-color:black;" but it doesn't change anything any ideas please ?!!
this is my XML file
<table class="table table-bordered" style="border-color:black;">
<tr class="border-black">
<th class="text-center">
<h3> Bulletin de paie </h3>
</th>
<th class="text-center" align="center">
Adresse
<br></br>
<span t-field="o.adresse"/>
</th>
<th class="text-center" align="center">
Date de paie
<br></br>
<span t-field="o.datedepaie"/>
</th>
</tr>
<tr bordercolor="black">
<th class="text-center">
<strong>Matricule</strong>
<br></br>
<span t-field="o.matricule"/>
</th>
<th class="text-center">
<strong>Nom et Prénom</strong>
<br></br>
<span t-field="o.name"/>
</th>
<th class="text-center">
<strong>CNSS</strong>
<br></br>
<span t-field="o.cnss"/>
</th>
</tr>
<tr bordercolor="black">
<th class="text-center">
<strong>Date de naissance</strong>
<br></br>
<span t-field="o.datenaissance"/>
</th>
<th class="text-center">
<strong>Date d'embauche</strong>
<br></br>
<span t-field="o.dateembauche"/>
</th>
<th class="text-center">
<strong>Fonction</strong>
<br></br>
<span t-field="o.fonction"/>
</th>
</tr>
</table>
<table class="table table-bordered">
<thead>
<tr>
<th>Libellé</th>
<th>Base</th>
<th>Taux (%)</th>
<th>Gains</th>
<th>Retenues</th>
</tr>
</thead>
<tbody class="invoice_tbody">
<tr t-foreach="o.salaire_id" t-as="l">
<th><span t-field="l.libelle"/></th>
<th><span t-field="l.base"/></th>
<th><span t-field="l.taux"/></th>
<th><span t-field="l.gains"/></th>
<th><span t-field="l.retenues"/></th>
</tr>
<tr>
<th colspan="3" class="text-right">
<strong>Total</strong>
</th>
<th class="text-center">
<span t-field="o.total"/>
</th>
<th class="text-center">
<span t-field="o.totall"/>
</th>
</tr>
<tr>
<th colspan="5" class="text-right">
<strong>Net à payer:</strong>
<span t-field="o.net"/>
<br></br>
</th>
</tr>
</tbody>
</table>
</div>
</div>
<div class="footer">
IIRC the border color is set on Rows, td's, and th's. It's out of the top of my head though.
You'd need to alter their border color, not the table's
You should post your CSS file. It would be helpful to see the existing styles. Also, line breaks don't require closing tags. I think your <br></br> are breaking the HTML.
You only need <br />
Is there a recommended way for using BeautifulSoup 4 in python when you have a table with no class or attribute values?
I was considering just using Get_Text() to dump the text out but if I wanted to pick individual values out or break the table into more discrete sections how would I go about it ?
<table cellpadding="0" cellspacing="0" id="programmeDescriptor" width="100%">
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="1">
Awards
</th>
</tr>
<tr>
</tr>
<tr>
<td>
Ordinary Bachelor Degree
</td>
</tr>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Programme Code:
</th>
<td width="150">
CodeValue
</td>
</tr>
</table>
</td>
<td width="5">
</td>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Mode of Delivery:
</th>
<td width="150">
Full Time
</td>
</tr>
</table>
</td>
<td width="5">
</td>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
No. of Semesters:
</th>
<td width="150">
6
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
NFQ Level:
</th>
<td width="150">
7
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Embedded Award:
</th>
<td width="150">
No
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th width="160">
Department:
</th>
<td>
Computing
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<h3>
Programme Outcomes
</h3>
<p class="info">
On successful completion of this programme the learner will be able to :
</p>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th width="30">
PO1
</th>
<td class="head" colspan="2">
Knowledge - Breadth
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</tr>
<tr>
<th width="30">
PO2
</th>
<td class="head" colspan="2">
Knowledge - Kind
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO3
</th>
<td class="head" colspan="2">
Skill - Range
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO4
</th>
<td class="head" colspan="2">
Skill - Selectivity
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO5
</th>
<td class="head" colspan="2">
Competence - Context
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<tdSome block of text </td>
</tr>
<tr>
<th width="30">
PO6
</th>
<td class="head" colspan="2">
Competence - Role
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO7
</th>
<td class="head" colspan="2">
Competence - Learning to Learn
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO8
</th>
<td class="head" colspan="2">
Competence - Insight
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• The graduate will demonstrate the ability to specify, design and build an IT system or research & report on a current IT topic
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<h3>
Semester Schedules
</h3>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 1 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td
<a href="index.cfm/page/module/moduleId/3897" target="_blank">
Web & User Experience
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3881" target="_blank">
Software Development 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1645" target="_blank">
Computer Architecture
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2328" target="_blank">
Discrete Mathematics 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3848" target="_blank">
Business & Information Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2054" target="_blank">
Learning to Learn at Third Level
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 1 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3886" target="_blank">
Software Development 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3895" target="_blank">
Object Oriented Systems Analysis
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3875" target="_blank">
Database Fundamentals
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3874" target="_blank">
Operating Systems Fundamentals
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2330" target="_blank">
Statistics
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2527" target="_blank">
Social Media Communications
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 2 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3877" target="_blank">
Web & Mobile Design & Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3876" target="_blank">
Database Design And Programming
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3869" target="_blank">
Software Development 3
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3873" target="_blank">
Software Quality Assurance and Testing
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3629" target="_blank">
Networking 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2477" target="_blank">
Discrete Mathematics 2
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 2 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3862" target="_blank">
Project
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3911" target="_blank">
Object Oriented Analysis & Design 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3877" target="_blank">
Web & Mobile Design & Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3630" target="_blank">
Networking 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3870" target="_blank">
Software Development 4
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2476" target="_blank">
Management Science
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 3 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3911" target="_blank">
Object Oriented Analysis & Design 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3899" target="_blank">
Operating Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1721" target="_blank">
Cloud Services & Distributed Computing
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2580" target="_blank">
Innovation & Entrepreneurship
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3878" target="_blank">
Web Application Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1689" target="_blank">
Algorithms and Data Structures 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2025" target="_blank">
Logic and Problem Solving
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3896" target="_blank">
Advanced Databases
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 3 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2465" target="_blank">
Project
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1728" target="_blank">
Algorithms and Data Structures 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1675" target="_blank">
Network Management
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2025" target="_blank">
Logic and Problem Solving
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3899" target="_blank">
Operating Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2580" target="_blank">
Innovation & Entrepreneurship
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1679" target="_blank">
Object Oriented Analysis & Design 2
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
First of all, the table, parent of all tables, has an id attribute - let's make it the base for the search:
super_table = soup.find("table", id="programmeDescriptor")
Then, according to what you've mentioned in the comment, it looks like you can distinguish each inner table from one another by it's headers. One option to implement this logic would be to find the header and then use find_parent() to find the parent table:
def get_table_by_header_name(super_table, header):
return super_table.find("th", text=header).find_parent("table")
Usage:
desired_table = get_table_by_header_name(super_table, "Awards")
You can iterate over certain tags. I dont know what would you like to do, but if you want to get the text of every <th> tag, then just iterate over them, and use get_text()
I need help on writing a script to log my Netgear GS108Tv2 smart switch's web interface and to do some configurations on mac filtering on my school's window 7 pc. I tried to use the mechanize code and some examples i found on the net, but the web interface's code format seems a little different from the example in the mechanize folder i downloaded.
Below is the code i came up with after using the example. For now i would be very glad if i am able to just log in to the web interface as i believe that is pretty much all i can do. The rest might prove a little tough but i can work on them after i manage to log in.
I am very new to code writing and totally fresh to python. Hope i can get all the help i could. Thank you very much.
import mechanize
browser = mechanize.Browser()
browser.open("http://172.16.164.23/")
browser.select_form(nr = 0)
browser.form["pwd"] = "password"
browser.submit()
browser.open("http://172.16.164.23/base/web_main.html")
well that is pretty much all could come up with, there are no errors but it does not seem to work.
Below is the web interface's code.
<html><head>
<link rel="stylesheet" href="/base/style.css" type="text/css">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <!-- Style Sheet link and Meta data -->
<title>NETGEAR GS108T</title> <!-- Netgear Page Title -->
<script src="/base/js/tabs_Layer2.js" type="text/javascript"></script>
<script src="/base/js/rollover.js" type="text/javascript"></script>
<script src="/base/js/browser.js" type="text/javascript"></script>
<script language="javascript">
var a1, a2, a3, a4, a5, a6, a7, a8;
a1 = new Image(130,29);
a1.src = "/base/images/tab_Login_off.gif";
a2 = new Image(130,29);
a2.src = "/base/images/tab_Login_on.gif";
a3 = new Image(130,29);
a3.src = "/base/images/tab_Login_ro.gif";
a4 = new Image(130,29);
a4.src = "/base/images/tab_Help_off.gif";
a5 = new Image(130,29);
a5.src = "/base/images/tab_Help_on.gif";
a6 = new Image(130,29);
a6.src = "/base/images/tab_Help_ro.gif";
a7 = new Image(130,29);
a7.src = "/base/images/onlinehelp_off.gif";
a8 = new Image(130,29);
a8.src = "/base/images/onlinehelp_on.gif";
</script>
<script type="text/javascript" language="JavaScript">
//******************************************************************************
//* Purpose: Display an input error message.
//* Return: none
//******************************************************************************
function DisplayErrorMsg()
{
alert(document.forms[0].err_msg.value);
}
function CheckError()
{
if (document.forms[0].err_flag.value == 1)
{
DisplayErrorMsg();
}
document.forms[0].pwd.focus();
}
</script>
<script language="javascript">
<!--
if (top != self)
{
top.location.href = "/base/web_main.html";
}
if (self.opener != undefined)
{
self.close();
self.opener.location.reload();
}
-->
</script>
</head>
<body onload="CheckError()">
<form method="POST" action="/base/main_login.html">
<table class="tableStyle" height="100%">
<tbody><tr class="topAlign">
<td class="leftEdge"> </td>
<td valign="top">
<table class="tableStyle">
<tbody><tr>
<td class="leftInside"><img src="/base/images/clear.gif" width="8"></td>
<td colspan="2">
<table class="tableStyle">
<tbody><tr>
<td class="logoNetGear space50Percent topAlign"><img src="/base/images/clear.gif" width="149" height="62"></td>
<td class="gs108tImage spacer50Percent topAlign rightHAlign"><img src="/base/images/clear.gif" width="190" height="60"></td> <!-- Used to get the top right logo image with help URL -->
</tr>
<tr>
<td colspan="2" class="bottomAlign">
<table class="tableStyle">
<tbody><tr>
<td class="spacer20Percent bottomAlign"><table border="0" cellpadding="0" cellspacing="0">
<tbody><tr id="tabss" class="spacer100Percent">
<td class="navTopCell"><img src="/base/images/tab_Login_on.gif" border="0"></td>
<td class="navTopCell"><img src="/base/images/tab_Help_off.gif" border="0"></td>
</tr>
</tbody></table>
</td>
<td class="loginActionCell spacer80Percent rightHAlign padding5LeftRight"></td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
<td class="rightHAlign"><img src="/base/images/clear.gif" width="2"></td>
</tr>
<tr class="background-blue">
<td><img src="/base/images/clear.gif" width="8"></td>
<td colspan="2" class="padding7Top spacer100Percent">
<div id="primaryNav">
<table border="0" cellpadding="0" cellspacing="0">
<tbody><tr>
<td id="cloneTd" class="navCell"> </td>
</tr>
<tr id="subTubs"></tr>
</tbody></table>
</div>
</td>
<td></td>
</tr>
</tbody></table>
</td>
<td class="rightEdge"> </td>
</tr>
<tr class="topAlign">
<td valign="top" class="leftBodyNotch topAlign"> </td>
<td>
<table class="tableStyle">
<tbody><tr class="topAlign">
<td class="leftNextBodyNotch"><img src="/base/images/clear.gif" width="11" height="16"></td>
<td class="middleBodyNotch spacer100Percent"> </td>
<td class="rightNextBodyNotch"><img src="/base/images/clear.gif" width="11"></td>
</tr>
</tbody></table>
</td>
<td class="rightBodyNotch"> </td>
</tr>
<tr height="100%">
<td rowspan="2" class="leftEdge"> </td>
<td valign="top">
<table class="tableStyle" height="100%">
<tbody><tr>
<td class="leftInside"><img src="/base/images/clear.gif" width="8"></td>
<td class="spacer100Percent loginTable topAlign">
<table class="loginBox">
<tbody><tr>
<td colspan="3">
<table class="tableStyle">
<tbody><tr>
<td colspan="2" class="subSectionTabTopLeft spacer80Percent font12BoldBlue">Login</td>
<td class="subSectionTabTopRight spacer20Percent"><img src="/base/images/help_icon.gif" width="12" height="12" title="Click for help"></td></tr><tr><td colspan="3" class="subSectionTabTopShadow"> </td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td class="subSectionBodyDot"> </td>
<td class="paddingsubSectionBodyNone">
<table class="tableStyle">
<tbody><tr>
<td class="font10Bold padding4Top">Password</td>
<td class="padding4Top"><input class="input" type="PASSWORD" name="pwd" maxlength="20" value=""></td>
</tr>
<tr>
<td colspan="2" class="padding5TopBottom10Right"><input type="IMAGE" name="login" src="/base/images/login_on.gif"></td>
</tr>
</tbody></table>
</td>
<td class="subSectionBodyDotRight"> </td>
</tr>
<tr>
<td colspan="3" class="subSectionBottom"> </td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
<td rowspan="2" class="rightEdge"> </td>
</tr>
<tr>
<td>
<table class="tableStyle">
<tbody><tr>
<td colspan="3" class="topBottomDivider" id="cloneTd"><img src="/base/images/clear.gif" height="3"></td>
</tr>
<tr>
<td colspan="3" class="footerBody">
<table class="tableStyle rightHAlign" align="right">
<tbody><tr>
<td id="ButtonsDiv"></td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td class="leftEdgeFooter"><img src="/base/images/clear.gif" width="11" height="9"></td>
<td>
<table class="tableStyle">
<tbody><tr>
<td class="leftBottomDivider"><img src="/base/images/clear.gif" width="11" height="9"></td>
<td class="middleBottomDivider spacer100Percent"><img src="/base/images/clear.gif" height="9"></td>
<td class="rightBottomDivider spacer1Percent"><img src="/base/images/clear.gif" height="9"></td>
</tr>
</tbody></table>
</td>
<td class="rightEdgeFooter"><img src="/base/images/clear.gif" width="11" height="9"></td>
</tr>
<tr>
<td class="leftCopyrightFooter"><img src="/base/images/clear.gif" width="11" height="9"></td>
<td class="middleCopyrightDivider">
<table class="blue10 tableStyle">
<tbody><tr class="topAlign">
<td>Copyright © 1996-2013 NETGEAR ®</td>
</tr>
</tbody></table>
</td>
<td class="rightCopyrightFooter"><img src="/base/images/clear.gif" width="11" height="9"></td>
</tr>
</tbody></table>
<input type="hidden" name="err_flag" size="16" maxlength="15" value="0">
<input type="hidden" name="err_msg" size="128" maxlength="512" value="">
</form>
</body></html>
this html is here :
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><META http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>
<div bgcolor="#48486c">
<table width="720" border="0" cellspacing="0" cellpadding="0" align="center" background="http://title.jpg" height="130">
<tr height="129">
<td width="719" height="129"></td>
<td width="1" height="129"></td>
</tr>
<tr height="1">
<td width="720" height="1"></td>
<td width="1" height="1"></td>
</tr>
</table>
<table width="720" border="0" cellspacing="0" cellpadding="0" align="center" height="203">
<tr height="20">
<td width="719" height="20"></td>
<td width="1" height="20"></td>
</tr>
<tr height="69">
<td width="719" height="69" valign="top" align="left">
<table width="719" border="1" cellspacing="2" cellpadding="0">
<tr>
<td bgcolor="a5fdf8" width="390"><b>Stream Name</b></td>
<td bgcolor="a5fdf8" width="61"><b>Status</b></td>
<td bgcolor="a5fdf8" width="61"><b>Duration</b></td>
<td bgcolor="a5fdf8" width="185"><b>Start</b></td>
</tr>
<tr bgcolor="white">
<td width="390">c:\streams\ours\Sony_AVCHD_<WBR>Test_Discs_60Hz_00001.m2ts</td>
<td width="61"><font color="#D0D0D0">----</font></td>
<td width="61">00:00:02</td>
<td width="185">2010/06/15-15:06:17</td>
</tr>
</table>
</td>
<td width="1" height="69"></td>
</tr>
<tr height="113">
<td width="720" height="113" colspan="2" valign="top" align="left">
<table width="721" border="1" cellspacing="2" cellpadding="0">
<tr bgcolor="a5fdf8">
<td width="299"><b>Test Category</b></td>
<td width="61"><b>Error</b></td>
<td width="62"><b>Warning</b></td>
<td width="275"><b>Details</b></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#099eac">All Tests (Sony_AVCHD_Test_Discs_60Hz_<WBR>00001.m2ts)</font></td>
<td width="61"><font color="#ff0000">34787</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#800000"> ETSI TR-101-290 Tests</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#800000"> ISO/IEC Transport Stream Tests</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#800000"> System Data T-STD Tests</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#099eac"> Prog(1)</font></td>
<td width="61"><font color="#ff0000">34787</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#099eac"> VES(0xe0)</font></td>
<td width="61"><font color="#ff0000">34787</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#1010F0"> H.264/AVC Conformance</font></td>
<td width="61"><font color="#ff0000">34718</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275">
<a><font color="#ff0000">Sony_AVCHD_Test_Discs_60Hz_<WBR>00001.m2ts_Prog(1)_PID(0x1011)<WBR>_H264_Conf.txt</font></a><br>
</td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#101010"> Sequence</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#101010"> Picture</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#101010"> Slice</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#101010"> Macroblock</font></td>
<td width="61"><font color="#ff0000">34718</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#101010"> Block</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#1010F0"> HRD Tests</font></td>
<td width="61"><font color="#ff0000">69</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275">
<a><font color="#ff0000">Sony_AVCHD_Test_Discs_60Hz_<WBR>00001.m2ts_Prog(1)_PID(0x1011)<WBR>_H264_HRD.txt</font></a><br>
</td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#101010"> HRD level</font></td>
<td width="61"><font color="#ff0000">69</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#800000"> Video T-STD Tests</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#099eac"> AES(0xfd)</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="61"><font color="#000000">0</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#808080"> Audio Level Tests</font></td>
<td width="61"><font color="#808080">Disabled</font></td>
<td width="61"><font color="#808080">Disabled</font></td>
<td width="275"></td>
</tr>
<tr bgcolor="white">
<td width="299"><font color="#800000"> Audio T-STD Tests</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="61"><font color="#800000">No Lic</font></td>
<td width="275"></td>
</tr>
</table>
</td>
</tr>
<tr height="1">
<td width="719" height="1"></td>
<td width="1" height="1"></td>
</tr>
</table>
</div>
</body></html>
has any python lib to do this ?
thanks
BeautifulSoup gets you almost all the way there:
>>> import BeautifulSoup
>>> f = open('a.html')
>>> soup = BeautifulSoup.BeautifulSoup(f)
>>> f.close()
>>> g = open('a.xml', 'w')
>>> print >> g, soup.prettify()
>>> g.close()
This closes all tags properly. The only issue remaining is that the doctype remains HTML -- to change that into the doctype of your choice, you only need to change the first line, which is not hard, e.g., instead of printing the prettified text directly,
>>> lines = soup.prettify().splitlines()
>>> lines[0] = ('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"'
'"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">')
>>> print >> g, '\n'.join(lines)
lxml works well:
from lxml import html, etree
doc = html.fromstring(open('a.html').read())
out = open('a.xhtml', 'wb')
out.write(etree.tostring(doc))
To piggyback off #Alex Martelli, as of Python 2.5, there is an xml module that comes baked into the standard library:
https://docs.python.org/3.6/library/xml.html
You could strip all HTML tags off, then format into xml and use the baked in XML library instead of bringing in another dependency. This is only advisable if you trust the source of the XML as you would be susceptible to all the standard XML vulnerabilities.