Python xhtml2pdf table cell text display to vertical - python

Im using xhtml2pdf to generate report in django and i would like to let one of my cells text display vertically but i couldn't make it using css.
Here is some attempt:
.vertical-text {
writing-mode: tb-rl;
}
<table>
<tbody>
<tr>
<td class="vertical-text" >V text</td>
</tr>
</tbody>
</table>
UPDATE
writing-mode property is missing in the supported css properties. Is there any workaround?

There is another work around, where we can use css: JSFiddle
.verical-text {
width:1px;
font-family: monospace;
white-space: pre-wrap; /* this is for displaying whitespaces including Firefox */
}
But, there are couple of downfall's here:
Is there has to be spaces between letters to ensure it's displayed in vertical.
The letters are not rotated, but would be in similar orientation.
I would suggest use some other tool, where you would not be restricted with css properties like:
PDFKit
PhantomJs - Write a custom nodeJs server in the backend which would do it.

Related

How to define Attributes like background colour

How we can give background colour or any other tags on row and column in this ?
There are three ways to set background-color property using inline CSS, internal CSS and external CSS,
<div style="background-color: red">
also you can apply this to your table row and as well as on table column like below.
<tr style="background-color: red">Table row</tr>
<td style="background-color: green">Table Column</td>

click Menu item with same ID using Selenium Python

Below is my HTML snippet and code I tried. I need to Click the Integrated Consoles menu item. I tried like below, But nothing happens and no error as well. Kindly help me to select the Specific Menu Item using the text inside the tag.
driver.find_element_by_xpath(".//td[contains text(),'Integrated Consoles']").click()
HTMl sample Snippet
<td nowrap="" id="MENU_TD110"> Integrated Consoles </td>
<td nowrap="" id="MENU_TD110"> System Information </td>
<td nowrap="" id="MENU_TD110"> More Tools </td>
Parentheses () are missing inside your contains method just enclose like below and try -
driver.find_element_by_xpath(".//td[contains(text(),'Integrated Consoles')]").click()

how to generate graphics from python (or printable tables)?

I would like to write in python a generator of multiplication tables for my children. I imagine something like a 10x10 table with 20 or 30 of the cells randomly bolded (a thicker border). What would be a good method to generate the printable output?
I am tentatively thinking of generating a LaTeX file but there may be a simpler (more pythonic, less dependencies) solution?
UPDATE: if someone is interested in the code to generate the above I posted it to bitbucket.org. This is an alpha version form a "Sunday developper" as we say in France (which means that the code is ugly and that you must not use in any circumstances when developing space shuttle management software :))
You might want to use HTML and CSS instead of Latex, it's a little bit simpler and cleaner, and just as printable.
<html>
<head>
<style>
table {border-collapse: collapse}
td { border:1px solid black; }
td.bolded { border:3px solid black }
</style>
</head>
<body>
<table>
<tr>
<td> 1 </td> <td> 2 </td> <td class="bolded"> 3 </td>
</tr>
</table>
</body>
</html>

Python pattern matching

I'm currently in the process of converting an old bash script of mine into a Python script with added functionality. I've been able to do most things, but I'm having a lot of trouble with Python pattern matching.
In my previous script, I downloaded a web page and used sed to get the elemented I wanted. The matching was done like so (for one of the values I wanted):
PM_NUMBER=`cat um.htm | LANG=sv_SE.iso88591 sed -n 's/.*ol.st.*pm.*count..\([0-9]*\).*/\1/p'`
It would match the number wrapped in <span class="count"></span> after the phrase "olästa pm". The markup I'm running this against is:
<td style="padding-left: 11px;">
<a href="/abuse_list.php">
<img src="/gfx/abuse_unread.png" width="15" height="12" alt="" title="9 anmälningar" />
</a>
</td>
<td align="center">
<a class="page_login_text" href="/pm.php" title="Du har 3 olästa pm.">
<span class="count">3</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/blogg_latest.php" title="Du har 1 ny bloggkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/user_guestbook.php" title="Min gästbok">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/forum.php?view=3" title="Du har 1 ny forumkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/user_images.php?user_id=162005&func=display_new_comments" title="Du har 1 ny albumkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/forum_favorites.php" title="Du har 2 uppdaterade trådar i "bevakade trådar"">
<span class="count">2</span>
</td>
I'm hesitant to post this, because it seems like I'm asking for a lot, but could someone please help me with a way to parse this in Python? I've been pulling my hair trying to do this, but regular expressions and I just don't match (pardon the pun). I've spent the last couple of hours experimenting and reading the Python manual on regular expressions, but I can't seem to figure it out.
Just to make it clear, what I need are 7 different expressions for matching the number within <span class="count"></span>. I need to, for example, be able to find the number of unread PMs ("olästa pm").
You will not parse html yourself. You will use a html parser built in python to parse the html.
Lightweight xml dom parser in python
Beautiful Soup
You can user lxml to pull out the values you are looking for pretty easily with xpaths
lxml
xpath
Example
from lxml import html
page = html.fromstring(open("um.htm", "r").read())
matches = page.xpath("//a[contains(#title, 'pm.') or contains(#title, 'ol')]/span")
print [elem.text for elem in matches]
use either:
BeautifulSoup
lxml
parsing HTML with regexes is a recipe for disaster.
It is impossible to reliably match HTML using regular expressions. It is usually possible to cobble something together that works for a specific page, but it is not advisable as even a subtle tweak to the source HTML can render all your work useless. HTML simply has a more complex structure than Regex is capable of describing.
The proper solution is to use a dedicated HTML parser. Note that even XML parsers won't do what you need, not reliably anyway. Valid XHTML is valid XML, but even valid HTML is not, even though it's quite similar. And valid HTML/XHTML is nearly impossible to find in the wild anyway.
There are a few different HTML parsers available:
BeautifulSoup is not in the standard library, but it is the most forgiving parser, it can handle almost all real-world HTML and it's designed to do exactly what you're trying to do.
HTMLParser is included in the Python standard library, but it is fairly strict about accepting only valid HTML.
htmllib is also in the standard library, but is deprecated.
As other people have suggested, BeautifulSoup is almost certainly your best choice.

How do I find all cells with a particular attribute in BeautifulSoup?

I am trying to develop a script to pull some data from a large number of html tables. One problem is that the number of rows that contain the information to create the column headings is indeterminate. I have discovered that the last row of the set of header rows has the attribute border-bottom for each cell with a value. Thus I decided to find those cells with the attribute border-bottom. As you can see I initialized a list. I intended to find the parent of each of the cells that end up in the borderCells list. However, when I run this code only one cell, that is the first cell in allCells with the attribute border-bottom is added to the list borderCells. For your information allCells has 193 cells, 9 of them have the attr border-bottom. Thus I was expecting nine members in the borderCells list. Any help is appreciated.
borderCells=[]
for each in allCells:
if each.find(attrs={"style": re.compile("border-bottom")}):
borderCells.append(each)
Is there any reason
borderCells = soup.findAll("td", style=re.compile("border-bottom")})
wouldn't work? It's kind of hard to figure out exactly what you're asking for, since your description of the original tables is pretty ambiguous, and it's not really clear what allCells is supposed to be either.
I would suggest giving a representative sample of the HTML you're working with, along with the "correct" results pulled from that table.
Well you know computers are always right. The answer is that the attrs are on different things in the html. What I was modeling on what some html that looked like this:
<TD nowrap align="left" valign="bottom">
<DIV style="border-bottom: 1px solid #000000; width: 1%; padding-bottom: 1px">
<B>Name</B>
</DIV>
</TD>
The other places in the file where style="border-bottom etc look like:
<TD colspan="2" nowrap align="center" valign="bottom" style="border-bottom: 1px solid 00000">
<B>Location</B>
</TD>
so now I have to modify the question to figure out how identify those cells where the attr is at the td level not the div level
Someone took away one of their answers though I tested it and it worked for me. Thanks for the help. Both answers worked and I learned a little bit more about how to post questions and after I stare at the code for a while I might learn more about Python and BeautifulSoup

Categories