How do I find all cells with a particular attribute in BeautifulSoup? - python

I am trying to develop a script to pull some data from a large number of html tables. One problem is that the number of rows that contain the information to create the column headings is indeterminate. I have discovered that the last row of the set of header rows has the attribute border-bottom for each cell with a value. Thus I decided to find those cells with the attribute border-bottom. As you can see I initialized a list. I intended to find the parent of each of the cells that end up in the borderCells list. However, when I run this code only one cell, that is the first cell in allCells with the attribute border-bottom is added to the list borderCells. For your information allCells has 193 cells, 9 of them have the attr border-bottom. Thus I was expecting nine members in the borderCells list. Any help is appreciated.
borderCells=[]
for each in allCells:
if each.find(attrs={"style": re.compile("border-bottom")}):
borderCells.append(each)

Is there any reason
borderCells = soup.findAll("td", style=re.compile("border-bottom")})
wouldn't work? It's kind of hard to figure out exactly what you're asking for, since your description of the original tables is pretty ambiguous, and it's not really clear what allCells is supposed to be either.
I would suggest giving a representative sample of the HTML you're working with, along with the "correct" results pulled from that table.

Well you know computers are always right. The answer is that the attrs are on different things in the html. What I was modeling on what some html that looked like this:
<TD nowrap align="left" valign="bottom">
<DIV style="border-bottom: 1px solid #000000; width: 1%; padding-bottom: 1px">
<B>Name</B>
</DIV>
</TD>
The other places in the file where style="border-bottom etc look like:
<TD colspan="2" nowrap align="center" valign="bottom" style="border-bottom: 1px solid 00000">
<B>Location</B>
</TD>
so now I have to modify the question to figure out how identify those cells where the attr is at the td level not the div level

Someone took away one of their answers though I tested it and it worked for me. Thanks for the help. Both answers worked and I learned a little bit more about how to post questions and after I stare at the code for a while I might learn more about Python and BeautifulSoup

Related

How to define Attributes like background colour

How we can give background colour or any other tags on row and column in this ?
There are three ways to set background-color property using inline CSS, internal CSS and external CSS,
<div style="background-color: red">
also you can apply this to your table row and as well as on table column like below.
<tr style="background-color: red">Table row</tr>
<td style="background-color: green">Table Column</td>

Retrieve value from span class XPath

I am trying to scrape some information from this website https://www.gumtree.co.za (https://www.gumtree.co.za/a-house-rentals-flat-rentals-offered/tamboerskloof/studio-flatlet-in-tamboerskloof/1005754794350910092234609 this is the link of the property I am taking information from); more specifically I am trying to take information from these span classes:
<div class="attribute">
<span class="name">Bathrooms (#):</span>
<span class="value">1</span>
</div>
I first want to check if the span class has Bathroom in it and then take the value for that. This is what I have right now:
bathrooms=response.xpath("//span[contains(text(),'Bathrooms')]/span[#class='value']text()").extract_first()
However, I do not get anything.
Any suggestions? Thank you!
This is the correct way to extract all the siblings.
Bathrooms=response.xpath("//span[contains(text(),'Bathrooms')]/following-sibling::*").extract_first()
For more, you can refer to this: XPath Axes
Hope this helps.

Python xhtml2pdf table cell text display to vertical

Im using xhtml2pdf to generate report in django and i would like to let one of my cells text display vertically but i couldn't make it using css.
Here is some attempt:
.vertical-text {
writing-mode: tb-rl;
}
<table>
<tbody>
<tr>
<td class="vertical-text" >V text</td>
</tr>
</tbody>
</table>
UPDATE
writing-mode property is missing in the supported css properties. Is there any workaround?
There is another work around, where we can use css: JSFiddle
.verical-text {
width:1px;
font-family: monospace;
white-space: pre-wrap; /* this is for displaying whitespaces including Firefox */
}
But, there are couple of downfall's here:
Is there has to be spaces between letters to ensure it's displayed in vertical.
The letters are not rotated, but would be in similar orientation.
I would suggest use some other tool, where you would not be restricted with css properties like:
PDFKit
PhantomJs - Write a custom nodeJs server in the backend which would do it.

click Menu item with same ID using Selenium Python

Below is my HTML snippet and code I tried. I need to Click the Integrated Consoles menu item. I tried like below, But nothing happens and no error as well. Kindly help me to select the Specific Menu Item using the text inside the tag.
driver.find_element_by_xpath(".//td[contains text(),'Integrated Consoles']").click()
HTMl sample Snippet
<td nowrap="" id="MENU_TD110"> Integrated Consoles </td>
<td nowrap="" id="MENU_TD110"> System Information </td>
<td nowrap="" id="MENU_TD110"> More Tools </td>
Parentheses () are missing inside your contains method just enclose like below and try -
driver.find_element_by_xpath(".//td[contains(text(),'Integrated Consoles')]").click()

how to generate graphics from python (or printable tables)?

I would like to write in python a generator of multiplication tables for my children. I imagine something like a 10x10 table with 20 or 30 of the cells randomly bolded (a thicker border). What would be a good method to generate the printable output?
I am tentatively thinking of generating a LaTeX file but there may be a simpler (more pythonic, less dependencies) solution?
UPDATE: if someone is interested in the code to generate the above I posted it to bitbucket.org. This is an alpha version form a "Sunday developper" as we say in France (which means that the code is ugly and that you must not use in any circumstances when developing space shuttle management software :))
You might want to use HTML and CSS instead of Latex, it's a little bit simpler and cleaner, and just as printable.
<html>
<head>
<style>
table {border-collapse: collapse}
td { border:1px solid black; }
td.bolded { border:3px solid black }
</style>
</head>
<body>
<table>
<tr>
<td> 1 </td> <td> 2 </td> <td class="bolded"> 3 </td>
</tr>
</table>
</body>
</html>

Categories