I am trying to pick the text located in tables. Dynamic means that there are sometimes one table and sometimes more than one tables. So my question is, how to pick the text from this tables.
This is what i tried:
from selenium import webdriver
# webdriver
browser = webdriver.Chrome("C:/Chrome/chromedriver.exe")
browser.get("http://homepage")
pick = browser.find_elements_by_xpath("//*[#id=\"xpath\"]/table[11]/tbody/tr/td/table[2]/tbody/tr/td[1]/span[2]")
pick.get_attribute("innerHTML")
and this is the xpath from each element:
//*[#id="xpath"]/table[11]/tbody/tr/td/table[2]/tbody/tr/td[1]/span[2]
//*[#id="xpath"]/table[11]/tbody/tr/td/table[3]/tbody/tr/td[1]/span[2]
//*[#id="xpath"]/table[11]/tbody/tr/td/table[4]/tbody/tr/td[1]/span[2]
and this is the html code:
<table style="width:700px; " border="0" cellpadding="0" cellspacing="0" width="700px">
<tbody>
<tr>
<td style="border:1px; border-style:solid; ">
<table style="width:700px; " border="0" cellpadding="0" cellspacing="0" width="700px">
<tbody>
<tr>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:250px; " valign="top" width="250px"><span> </span><span style="font-size:10pt; font-weight:bold; "> </span></td>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:450px; " valign="middle" width="450px"><span style="font-size:10pt; "> </span><br></td>
</tr>
</tbody>
</table>
<table style="width:700px; " border="0" cellpadding="0" cellspacing="0" width="700px">
<tbody>
<tr>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:250px; " valign="top" width="250px"><span> </span><span style="font-size:10pt; ">3</span></td>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:450px; " valign="middle" width="450px"><span style="font-size:10pt; ">Bleaching preparations and other substances for laundry use; cleaning, polishing, scouring and abrasive preparations; soaps; perfumery, essential oils, cosmetics, hair lotions; dentifrices (all the goods listed alphabetically in the Nice Classification, included in this class).</span><br></td>
</tr>
</tbody>
</table>
<table style="width:700px; " border="0" cellpadding="0" cellspacing="0" width="700px">
<tbody>
<tr>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:250px; " valign="top" width="250px"><span> </span><span style="font-size:10pt; ">4</span></td>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:450px; " valign="middle" width="450px"><span style="font-size:10pt; ">Industrial oils and greases; lubricants; dust absorbing, wetting and binding compositions; fuels (including motor spirit) and illuminants; candles, wicks (all goods of this class included in the alphabetical list of Nice Classification).</span><br></td>
</tr>
</tbody>
</table>
<table style="width:700px; " border="0" cellpadding="0" cellspacing="0" width="700px">
<tbody>
<tr>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:250px; " valign="top" width="250px"><span> </span><span style="font-size:10pt; ">5</span></td>
<td style="border:1px; border-bottom-color:silver; border-bottom-style:solid; width:450px; " valign="middle" width="450px"><span style="font-size:10pt; ">Pharmaceutical and veterinary preparations; sanitary preparations for medical purposes; dietetic foods and substances adapted for medical and veterinary use; food for babies; dietary supplements for humans and animals;plasters, materials for dressings; material for stopping teeth and dental wax; disinfectants; preparations for destroying vermin; fungicides, herbicides;(all goods of this class included in the alphabetical list of Nice Classification).</span><br></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
thank you for any help!
Related
<html>
<body>
<table style="border:0">
<tbody>
<tr class="">
<td class="pr10">Mon</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Tue</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="bold">
<td class="pr10">Wed</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Thu</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Fri</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Sat</td>
<td class="pl10">11am – 11pm</td>
</tr>
<tr class="">
<td class="pr10">Sun</td>
<td class="pl10">11am – 11pm</td>
</tr>
</tbody>
</table>
</html>
</body>
Try 1:
driver.find_elements_by_xpath("//*[#class='pr10']")
Try 2:
driver.find_element_by_xpath("//tr[td='Mon']/td").text
But it not fetching the text "Mon" "11am - 11pm"
text_area = driver.find_elements_by_xpath("//*[#class='pr10']")
for items2 in text_area:
print(items2.text)
try this instead:
text_area = driver.find_elements_by_xpath("""//*[#id="body"]/table/tbody/tr[1]/td[1]""")
print([elm.get_attribute('innerHTML') for elm in text_area])
I was scraping this website (https://www.ivolatility.com/options/RVX/) using python module request. The output from the selection of the first table using beautifulsoap is above. Now, inside of this first table I am trying to get a specific value (19.17) from this soup obtained from python module requests.
I would like to achieve it using Beautifulsoap, I don't know how to specifically select the cell where it is saved.
Do any of you have any suggestions?
Output from requests:
<table border="0" bordercolor="red" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="3"><script language="JavaScript">
function submitCalcForm(event) {
event.preventDefault();
var form = document.getElementById('basicOptionsForm');
var action = form.action;
var regions = ['', 'USA', 'Europe', 'Asia', 'Canada'];
var regionsOptions = form[1];
var selectedRegion = regionsOptions.options[regionsOptions.selectedIndex].value;
var symbol = form[0].value.trim();
var location = (window.location.href.indexOf('.j')>-1)
? (form.action + '?' + form[0].name + '=' + form[0].value + '&' + form[1].name + '=' + selectedRegion)
: ('/options/'+ ((symbol == '') ? '-' : symbol ) +'/'+regions[selectedRegion]);
window.location.href= location;
}
function goToLookup() {
window.location.href= "/options/-/";
}
</script>
<form action="/options.j" id="basicOptionsForm" method="get" onsubmit="submitCalcForm(event);">
<table bgcolor="#ffffff" border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
<table bgcolor="#999999" border="0" cellpadding="0" cellspacing="1">
<tr>
<td bgcolor="#567abb">
<table border="0" cellpadding="1" cellspacing="0" class="table-action">
<tr>
<td><span class="s1w" style="color: #fff;"> Symbol: </span></td><td><input class="s2" name="ticker" size="5" type="text" value="RVX"/></td><td><select class="s2" name="R"><option selected="" value="0">
ALL
</option><option value="1">
USA
</option><option value="2">
Europe
</option><option value="4">
Canada
</option></select></td><td><span class="s2"> </span></td><td><button style="background: #0C6EF8; font-weight: bold; border: 1px solid black;" type="submit">GO!</button></td><td><span class="s2"> </span></td><td><button onclick="goToLookup();" style="background: #0C6EF8; font-weight: bold; border: 1px solid black; color: white; white-space: nowrap;" type="button">
Symbol Lookup</button></td><td><span class="s2"> </span></td>
</tr>
</table>
</td>
</tr>
</table>
</td><td><img border="0" height="1" src="/design/images/0.gif" width="5"/></td><td nowrap=""><b><span class="s4">Russell 2000 Volatility Index</span></b></td><td width="100%"> </td>
</tr>
</table>
</form>
</td>
</tr>
<tr>
<td colspan="3"><img alt="." border="0" height="10" src="/design/images/0.gif" width="1"/></td>
</tr>
<tr valign="top">
<td width="100%"><script type="text/javascript">
<!--
function wr(s) {
document.write(s);
}
var d = new Array(10);
d[20]='N/A';d[25]='-94.06%';d[30]='32.03%';d[35]='34.74';d[56]='N/A';d[61]='N/A';d[66]='10-Apr';d[71]='84.49%';d[97]='N/A';d[102]='03-Oct';d[107]='29-Mar';d[112]='1.43';d[133]='N/A';d[138]='N/A';d[143]='148.97%';d[148]='98.46%';d[174]='N/A';d[179]='-46.88%';d[184]='198.21%';d[189]='0.27';d[210]='N/A';d[215]='N/A';d[220]='25-May';d[225]='110.30%';d[251]='N/A';d[256]='-68.76%';d[261]='75.38%';d[266]='0';d[287]='N/A';d[292]='N/A';d[297]='39.85%';d[302]='120.02%';d[328]='N/A';d[333]='-67.09%';d[338]='69.94%';d[343]='19.17';d[364]='N/A';d[369]='N/A';d[374]='06-Apr';d[379]='06/14/2018';d[405]='N/A';d[410]='-82.49%';d[415]='74.41%';d[441]='N/A';d[446]='N/A';d[451]='164.16%';d[456]='12.93';d[482]='N/A';d[487]='24-May';d[492]='77.70%';d[518]='N/A';d[523]='03-May';d[528]='21-May';d[533]='12/24/2018';d[559]='N/A';d[564]='59.42%';d[569]='84.78%';
wr('<table class="table-data" cellpadding=1 cellspacing=1 border=0 width=100%>');
wr('<tr bgcolor="#cccccc" align=right height=20>');
wr('<td align="center"><font class=s1>Price</font></td>');
wr('<td align="center"><font class=s1>Change (%)</font><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');
wr('<td align="center"><font class=s1>52 wk High</font><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');
wr('<td align="center"><font class=s1>52 wk Low</font><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');
wr('<td align="center"><font class=s1>Stock volume</font>');
wr('<a href="javascript:openHelp(14)" alt="Open Help">');
wr('<img src="/design/images/ico/q_zn.gif" width=8 height=10 border=0 alt="Open Help"/>');
wr('</a><img src="/design/images/0.gif" width=4 height=1 border=0/></td>');
wr('</tr>');
wr('<tr bgcolor="#FFFFFF" align=right height=20>');
wr('<td align="center"><font class=s1>');
wr(d[343]);
wr('</font></td>');
wr('<td align="center"><font class=s1><nobr> ');
wr('<img src="/design/images/ico/up.gif" alt="+" border=0 align="absmiddle" width=7 height=9/> +');
wr(d[189]);
wr(' (+');
wr(d[112]);
wr('%)</nobr></font></td>');
wr('<td align="center"><font class=s1><nobr> ');
wr(d[35]);
wr(' ');
wr(d[533]);
wr('</nobr></font></td><td align="center"><font class=s1><nobr> ');
wr(d[456]);
wr(' ');
wr(d[379]);
wr('</nobr></font></td>');
wr('<td align="center"><font size=-2 class=s1>');
wr(d[266]);
wr('</font></td>');
wr('</tr></table>');
//-->
</script><img border="0" height="10" src="/design/images/0.gif" width="1"/><table border="0" cellpadding="0" cellspacing="0" class="table-data" width="100%">
<tr align="center" bgcolor="
#cccccc
" height="20">
<td align="center" colspan="2"><font class="s2">Current</font></td><td><font class="s2">1 WK AGO</font></td><td><font class="s2">1 MO AGO</font></td><td><font class="s2">52 wk Hi/Date</font></td><td><font class="s2">52 wk Low/Date</font></td>
</tr>
<tr>
<td align="center" bgcolor="
#FFFFFF
" colspan="5" height="20"><font class="s2" color=""> HISTORICAL VOLATILITY <a alt="Open Help" href="javascript:openHelp(4)"><img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></a></font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">10 days</font></td><td><font class="s2">120.02%</font></td><td><font class="s2">84.49%</font></td><td><font class="s2">74.41%</font></td><td><font class="s2">198.21% - 29-Mar</font></td><td><font class="s2">32.03% - 21-May</font></td>
</tr>
<tr align="center" bgcolor="#eeeeee">
<td align="right"><font class="s2">20 days</font></td><td><font class="s2">110.30%</font></td><td><font class="s2">84.78%</font></td><td><font class="s2">69.94%</font></td><td><font class="s2">164.16% - 06-Apr</font></td><td><font class="s2">39.85% - 25-May</font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">30 days</font></td><td><font class="s2">98.46%</font></td><td><font class="s2">77.70%</font></td><td><font class="s2">75.38%</font></td><td><font class="s2">148.97% - 10-Apr</font></td><td><font class="s2">59.42% - 24-May</font></td>
</tr>
<tr>
<td align="center" bgcolor="
#FFFFFF
" colspan="5" height="20"><font class="s2" color=""> IMPLIED VOLATILITY <img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">IV Index call <img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A - N/A</font></td><td><font class="s2">N/A - N/A</font></td>
</tr>
<tr align="center" bgcolor="#eeeeee">
<td align="right"><font class="s2">IV Index put <img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A - N/A</font></td><td><font class="s2">N/A - N/A</font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">IV Index mean <img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A</font></td><td><font class="s2">N/A - N/A</font></td><td><font class="s2">N/A - N/A</font></td>
</tr>
<tr>
<td align="center" bgcolor="
#FFFFFF
" colspan="5" height="20"><font class="s2" color="">HISTORICAL 30-DAYS CORRELATION AGAINST S&P 500 Index (SPX)<img alt="Open Help" border="0" height="10" src="/design/images/ico/q_zn.gif" width="8"/></font></td>
</tr>
<tr align="center" bgcolor="#ffffff">
<td align="right"><font class="s2">30 days</font></td><td><font class="s2">-82.49%</font></td><td><font class="s2">-67.09%</font></td><td><font class="s2">-68.76%</font></td><td><font class="s2">-46.88% - 03-Oct</font></td><td><font class="s2">-94.06% - 03-May</font></td>
</tr>
</table>
</td>
</tr>
</table>
The page is dynamic so you'd need to render the page first with something like Selenium.
Also, you can use BeautfifulSoup, or even Selenium, to parse the html once you have it. But I noticed that it's located within <table> tags. Whenever I see a <table> tag, I usually opt to go with pandas' .read_html() as it'll do the hard work for you.
.read_html() will return a list of dataframes, then it's just a matter of finding the data you want, or maniupulate the table as needed. The data you want was found in the dataframe in index position 4, (it was also in position 0, but I choose to go with 4 since it was right there, 2nd row, first column). Then just slice that dataframe to get hat specific cell:
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://www.ivolatility.com/options/RVX/'
driver.get(url)
tables = pd.read_html(driver.page_source)
price = tables[4][0][1]
driver.close()
Output:
print (price)
19.17
I am doing a web-scraping to a site and i just want to get the first string of a node, i have tried already the child and contains function.
The html code that i have:
<div id="m0" style="visibility:visible; display:block;">
<table class="fl">
<tr bgcolor="white"><td class="v px3"></td>
<td class="ch">
<a title="Id: NetViet" class="A3">NetViet</a></td>
<td class="cr" ">Clear</td>
<tr bgcolor="white"><td class="v px3"></td>
<td class="ch">
<a title="Id: Vozrojdenie.tv" class="A3">VOTV</a></td>
<tr bgcolor="white"><td class="v px3"></td>
<td class="ch">
<A class="A3" title="Id: Suryoyo Sat" HREF="http://www.suryoyosat.com/" TARGET="_blank">Suryoyo Sat</A></td>
<td class="cr" ">Clear</td>
<div id="m1" style="visibility:visible; display:block;">
<table class="fl">
<tr bgcolor="#DDD0B8"><td class="v px3"></td>
<td class="ch">
<a title="Sporadic or full 16/9 transmission"></td>
<td class="cr" ">Conax<br />Irdeto 2<br />Mediaguard 3<br />Nagravision 3<br />Viaccess 3.0</td>
<tr bgcolor="#DDD0B8"><td class="v px3"></td>
<td class="ch">
<a title="Id: Sportklub HD" class="A3">Sport Klub HD Poland</a></td>
<td class="cr" ">Conax<br />Mediaguard 3<br />Nagravision 3<br />Viaccess 3.0</td>
<tr bgcolor="#DDD0B8"><td class="v px3"></td>
<td class="ch">
<a title="Id: Animal Planet HD" class="A3">Animal Planet HD</a></td>
<td class="cr" ">Conax<br />Irdeto 2<br />Mediaguard 3<br />Nagravision 3<br />Viaccess 3.0</td>
I am using the xpath query:
encrypted=tree.xpath('//div[#id="%s"]/table[#class= "fl"]/tr/td[#class="cr"]/text()'%div)
it returns:
[['Clear','Clear','Clear'],['Conax','Irdeto 2','Mediaguard 3','Nagravision 3','Conax','Mediaguard 3','Nagravision 3', 'Viaccess 3.0',...]]
and i want it to return:
[['Clear','Clear','Clear'],['Conax','Conax','Conax',...]]
I am trying this query but it gives me nothing:
encrypted=tree.xpath('substring-before(//div[#id="%s"]/table[#class= "fl"]/tr/td[#class="cr"]/text(),"C")'%div)
Any idea? (I am using lxml and requests from python, xpath 1.0)
I installed BeautifulSoup, read the documentation and found some tutorials on getting info from a table, but only from basic tables with a couple rows and columns.
I'm having trouble understanding how to do something more complex.
I have an html doc and use
table = soup.findAll("table")
To find the table. What I get back is a bunch of listings looking like this:
<tbody class="item" data-buyout="8 exalted" data-ign="POOPOODOODOO" data-league="Standard"
data-name="Dusk Stone Cobalt Jewel" data-seller="LooseSausage" data-sellerid="3447078"
data-thread="1285903" id="item-container-0">
<tr class="first-line"> <td class="icon-td"> <div class="icon"><img alt="Item icon" src="http://webcdn.pathofexile.com/image/Art/2DItems/Jewels/basicint.png?scale=1&w=1&h=1&v=cd579ea22c05f1c6ad2fd015d7a710bd3">\n<div class="sockets" style="position: absolute;">
\n<div class="sockets-inner" style="position: relative; width:94px;">\n</div>\n</div></img></div> </td>
<td class="item-cell">
<h5><a class="title itemframe2" href="http://www.pathofexile.com/forum/view-thread/1285903" target="_blank"> Dusk Stone Cobalt Jewel </a></h5><p class="requirements"> </p> <span class="sockets-raw" style="display:none"></span> <ul class="item-mods"><li class="bullet-item"><ul class="mods">
<li class="sortable " data-name="##% increased Attack Speed" data-value="4.0" style=""><b>4</b>% increased Attack Speed</li>
<li class="sortable " data-name="#Minions have #% increased maximum Life" data-value="10.0" style="">Minions have <b>10</b>% increased maximum Life</li>
<li class="sortable " data-name="##% increased Ignite Duration on Enemies" data-value="5.0" style=""><b>5</b>% increased Ignite Duration on Enemies</li><li class="sortable active" data-name="#Minions deal #% increased Damage" data-value="16.0" style="">Minions deal <b>16</b>% increased Damage</li>
<li class="sortable " data-name="##% chance to Ignite" data-value="2.0" style=""><b>2</b>% chance to Ignite</li></ul></li></ul> </td> <td class="table-stats"> <table> <tr class="calibrate"> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> <th></th> </tr> <tr class="cell-first"> <th class="disabled" colspan="2">Quality</th> <th class="disabled" colspan="2">Phys.</th> <th class="disabled" colspan="2">Elem.</th> <th class="disabled" colspan="2">APS</th> <th class="disabled" colspan="2">DPS</th> <th class="disabled" colspan="2">pDPS</th> <th class="disabled" colspan="2">eDPS</th> </tr> <tr class="cell-first"> <td class="sortable property " colspan="2" data-name="q" data-value="0"> 0<span class="capped">+20</span> </td> <td class="sortable property " colspan="2" data-name="quality_pd" data-value="0.0"> </td> <td class="sortable property " colspan="2" data-ed="" data-name="ed" data-value="0.0"> </td> <td class="sortable property " colspan="2" data-name="aps" data-value="0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="quality_dps" data-value="0.0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="quality_pdps" data-value="0.0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="edps" data-value="0.0"> \xa0 </td> </tr> <tr class="cell-second"> <th class="cell-empty"></th> <th class="disabled" colspan="2">Armour</th> <th class="disabled" colspan="2">Evasion</th> <th class="disabled" colspan="2">Shield</th> <th class="disabled" colspan="2">Block</th> <th class="disabled" colspan="2">Crit.</th> <th class="disabled" colspan="2">Level</th> </tr> <tr class="cell-second"> <td class="cell-empty"></td> <td class="sortable property " colspan="2" data-name="quality_armour" data-value="0.0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="quality_evasion" data-value="0.0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="quality_shield" data-value="0.0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="block" data-value="0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="crit" data-value="0"> \xa0 </td> <td class="sortable property " colspan="2" data-name="level" data-value="0"> \xa0 </td> </tr> </table> </td> </tr> <tr class="bottom-row"> <td class="first-cell"></td> <td> <span class="requirements"> <span class="sortable" data-name="price_in_chaos" data-value="-212.0"><span class="has-tip currency currency-exalted" data-tooltip="" title="8 exalted">8\xd7</span></span> \xb7 <span class="click-button" data-hash="61053af5f4c3a82e330b3e38192b8480" data-thread="1285903" onclick="verify_modern(this)">Verify</span> \xb7 <span class="success label">online</span> IGN: POOPOODOODOO \xb7 <span class="has-tip" data-tooltip="" title="account age and highest level">a857h93</span> \xb7 PM \xb7 Whisper </span> </td> <td class="third-cell" colspan="16"></td> </tr> <tr><td class="item-separator" colspan="16"></td></tr>
</tbody>
What I would like to do is take the parts that say data-buyout, data-ign, data-name etc and save them in variables for use later. Then skip down to the li class="sortable" and get the data-name part or the text.
(Since they seem to be the same thing)
Im having trouble understanding how to do this all within the tbody class= "item" section.
I need to do this several times, since there are multiple items with the tbody class="item" in each table.
Any information I can get is greatly appreciated!
You may get data as follows:
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_string_above)
>>>soup.tbody.attrs
{'data-sellerid': '3447078', 'data-buyout': '8 exalted', 'data-league': 'Standard', 'data-name': 'Dusk Stone Cobalt Jewel', 'data-thread': '1285903', 'id': 'item-container-0', 'data-ign': 'POOPOODOODOO', 'data-seller': 'LooseSausage', 'class': ['item']}
>>>[x.get_text(strip=True) for x in soup.select('li')]
['4% increased Attack SpeedMinions have10% increased maximum Life5% increased Ignite Duration on EnemiesMinions deal16% increased Damage2% chance to Ignite', '4% increased Attack Speed', 'Minions have10% increased maximum Life', '5% increased Ignite Duration on Enemies', 'Minions deal16% increased Damage', '2% chance to Ignite']
Beautiful soup documentation provides attributes .contents and .children to access the children of a given tag (a list and an iterable respectively), and includes both Navigable Strings and Tags. I want only the children of type Tag.
I'm currently accomplishing this using list comprehension:
rows=[x for x in table.tbody.children if type(x)==bs4.element.Tag]
but I'm wondering if there is a better/more pythonic/built-in way to get just Tag children.
thanks to J.F.Sebastian , the following will work:
rows=table.tbody.find_all(True, recursive=False)
Documentation here: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#true
In my case, I needed actual rows in the table, so I ended up using the following, which is more precise and I think more readable:
rows=table.tbody.find_all('tr')
Again, docs: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names
I believe this is a better way than iterating through all the children of a Tag.
Worked with the following input:
<table cellspacing="0" cellpadding="0">
<thead>
<tr class="title-row">
<th class="title" colspan="100">
<div style="position:relative;">
President
<span class="pct-rpt">
99% reporting
</span>
</div>
</th>
</tr>
<tr class="header-row">
<th class="photo first">
</th>
<th class="candidate ">
Candidate
</th>
<th class="party ">
Party
</th>
<th class="votes ">
Votes
</th>
<th class="pct ">
Pct.
</th>
<th class="change ">
Change from ‘08
</th>
<th class="evotes last">
Electoral Votes
</th>
</tr>
</thead>
<tbody>
<tr class="">
<td class="photo first">
<div class="photo_wrap"><img alt="P-barack-obama" height="48" src="http://i1.nyt.com/projects/assets/election_2012/images/candidate_photos/election_night/p-barack-obama.jpg?1352320690" width="68" /></div>
</td>
<td class="candidate ">
<div class="winner dem"><img alt="Hp-checkmark#2x" height="9" src="http://i1.nyt.com/projects/assets/election_2012/images/swatches/hp-checkmark#2x.png?1352320690" width="10" />Barack Obama</div>
</td>
<td class="party ">
Dem.
</td>
<td class="votes ">
2,916,811
</td>
<td class="pct ">
57.3%
</td>
<td class="change ">
-4.6%
</td>
<td class="evotes last">
20
</td>
</tr>
<tr class="">
<td class="photo first">
</td>
<td class="candidate ">
<div class="not-winner">Mitt Romney</div>
</td>
<td class="party ">
Rep.
</td>
<td class="votes ">
2,090,116
</td>
<td class="pct ">
41.1%
</td>
<td class="change ">
+4.3%
</td>
<td class="evotes last">
0
</td>
</tr>
<tr class="">
<td class="photo first">
</td>
<td class="candidate ">
<div class="not-winner">Gary Johnson</div>
</td>
<td class="party ">
Lib.
</td>
<td class="votes ">
54,798
</td>
<td class="pct ">
1.1%
</td>
<td class="change ">
–
</td>
<td class="evotes last">
0
</td>
</tr>
<tr class="last-row">
<td class="photo first">
</td>
<td class="candidate ">
div class="not-winner">Jill Stein</div>
</td>
<td class="party ">
Green
</td>
<td class="votes ">
29,336
</td>
<td class="pct ">
0.6%
</td>
<td class="change ">
–
</td>
<td class="evotes last">
0
</td>
</tr>
<tr>
<td class="footer" colspan="100">
President Map |
President Big Board |
Exit Polls
</td>
</tr>
</tbody>
</table>