python-docx trailing trilling whitepsaces not showing correctly - python

Goal
I am trying to add a text to a table cell where the text is a combination of 2 strings and the space between the strings of variable size so that the final text has the same length and it appears as if the second string is right aligned.
I can either use format or ljust to combine the strings in python.
period = "from Monday to Friday"
item_text = "Some txt"
item_text2 = "Some other txt"
t1 = "t1: {:<30}{:0}".format(item_text,period)
t2 = "t2: {:<30}{:0}".format(item_text2,period)
t3 = f"t3: {item_text.ljust(30)}{period}"
t4 = f"t4: {item_text2.ljust(30)}{period}"
from pprint import pprint
pprint(t1)
pprint(t2)
pprint(t3)
pprint(t4)
Text in python with variable space length between strings
However, if I add this text to a docx table, the space between the strings changes.
from docx import Document
doc = Document()
# Creating a table object
table = doc.add_table(rows=2, cols=2, style="Table Grid")
table.rows[0].cells[0].text = f"{item_text.ljust(30)}{period}"
table.rows[1].cells[0].text = f"{item_text2.ljust(30)}{period}"
def set_col_widths(table):
widths = tuple( Cm(val) for val in [15,8])
for row in table.rows:
for idx, width in enumerate(widths):
row.cells[idx].width = width
set_col_widths(table)
doc.save("test_whitespace.docx")
Text in word. Space between strings changed.
Note
I am aware that I could add a table to the table cell and left adjust the left and right adjust the right but that seems like way more code to write.
Question
Why is the spacing changing in the word document and how can I create the text differently to get the desired goal?

Related

How to create a text shape with python pptx?

I want to add a text box to a presentation with python pptx. I would like to add a text box with several paragraphs in the specific place and then format it (fonts, color, etc.). But since text shape object always comes with the one paragraph in the beginning, I cannot edit first of my paragraphs. The code sample looks like this:
txBox = slide.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame
p = tf.add_paragraph()
p.text = "This is a first paragraph"
p.font.size = Pt(11)
p = tf.add_paragraph()
p.text = "This is a second paragraph"
p.font.size = Pt(11)
Which creates output like this:
I can add text to this first line with tf.text = "This is text inside a textbox", but it won't be editable in terms of fonts or colors. So is there any way how I can omit or edit that line, so all paragraphs in the box would be the same?
Access the first paragraph differently, using:
p = tf.paragraphs[0]
Then you can add runs, set fonts and all the rest of it just like with a paragraph you get back from tf.add_paragraph().

Iterating through cell sentences/paragraphs - docx tables

I am looking to iterate through sentences/paragraphs within cells of a docx table, performing functions depending on their style tags using the pywin32 module.
I can manually select the cell using
cell = table.Cell(Row = 1, Column =2)
I tried using something like for x in cell: #do something but
<class 'win32com.client.CDispatch'> objects 'do not support enumeration'
I tried looking through: Word OM to find a solution but to no avail (I understand this is for VBA, but still can be very useful)
Here is a simple example that reads the content from the the first row / first column of the first table in a document and prints it word-by-word:
import win32com.client as win32
import os
wordApp = win32.gencache.EnsureDispatch("Word.Application")
wordApp.Visible = False
doc = wordApp.Documents.Open(os.getcwd() + "\\Test.docx")
table = doc.Tables(1)
for word in table.Cell(Row = 1, Column = 1).Range.Text.split():
print(word)
wordApp.Application.Quit(-1)
The cell's content is just a string, you could easily also split it by paragraphs using split('\r') or by sentences using split('.').

docx center text in table cells

So I am starting to use pythons docx library. Now, I create a table with multiple rows, and only 2 columns, it looks like this:
Now, I would like the text in those cells to be centered horizontally. How can I do this? I've searched through docx API documentation but I only saw information about aligning paragraphs.
There is a code to do this by setting the alignment as you create cells.
doc=Document()
table = doc.add_table(rows=0, columns=2)
row=table.add_row().cells
p=row[0].add_paragraph('left justified text')
p.alignment=WD_ALIGN_PARAGRAPH.LEFT
p=row[1].add_paragraph('right justified text')
p.alignment=WD_ALIGN_PARAGRAPH.RIGHT
code by: bnlawrence
and to align text to the center just change:
p.alignment=WD_ALIGN_PARAGRAPH.CENTER
solution found here: Modify the alignment of cells in a table
Well, it seems that adding a paragraph works, but (oh, really?) it addes a new paragraph -- so in my case it wasn't an option. You could change the value of the existing cell and then change paragraph's alignment:
row[0].text = "hey, beauty"
p = row[0].paragraphs[0]
p.alignment = docx.enum.text.WD_ALIGN_PARAGRAPH.CENTER
Actually, in the top answer this first "docx.enum.text" was missing :)
The most reliable way that I have found for setting he alignment of a table cell (or really any text property) is through styles. Define a style for center-aligned text in your document stub, either programatically or through the Word UI. Then it just becomes a matter of applying the style to your text.
If you create the cell by setting its text property, you can just do
for col in table.columns:
for cell in col.cells:
cell.paragraphs[0].style = 'My Center Aligned Style'
If you have more advanced contents, you will have to add another loop to your function:
for col in table.columns:
for cell in col.cells:
for par in cell.paragraphs:
par.style = 'My Center Aligned Style'
You can easily stick this code into a function that will accept a table object and a style name, and format the whole thing.
In my case I used this.
from docx.enum.text import WD_ALIGN_PARAGRAPH
def addCellText(row_cells, index, text):
row_cells[index].text = str(text)
paragraph=row_cells[index].paragraphs[0]
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
font = paragraph.runs[0].font
font.size= Pt(10)
def addCellTextRight(row_cells, index, text):
row_cells[index].text = str(text)
paragraph=row_cells[index].paragraphs[0]
paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
font = paragraph.runs[0].font
font.size= Pt(10)
For total alignment to center I use this code:
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.table import WD_ALIGN_VERTICAL
for row in table.rows:
for cell in row.cells:
cell.paragraphs[0].alignment = WD_ALIGN_PARAGRAPH.CENTER
cell.vertical_alignment = WD_ALIGN_VERTICAL.CENTER
From docx.enum.table import WD_TABLE_ALIGNMENT
table = document.add_table(3, 3)
table.alignment = WD_TABLE_ALIGNMENT.CENTER
For details see a link .
http://python-docx.readthedocs.io/en/latest/api/enum/WdRowAlignment.html

How do I find the formatting for a subset of text in an Excel document cell

Using Python, I need to find all substrings in a given Excel sheet cell that are either bold or italic.
My problem is similar to this:
Using XLRD module and Python to determine cell font style (italics or not)
..but the solution is not applicable for me as I cannot assume that the same formatting holds for all content in the cell. The value in a single cell can look like this:
1. Some bold text Some normal text. Some italic text.
Is there a way to find the formatting of a range of characters in a cell using xlrd (or any other Python Excel module)?
Thanks to #Vyassa for all of the right pointers, I've been able to write the following code which iterates over the rows in a XLS file and outputs style information for cells with "single" style information (e.g., the whole cell is italic) or style "segments" (e.g., part of the cell is italic, part of it is not).
import xlrd
# accessing Column 'C' in this example
COL_IDX = 2
book = xlrd.open_workbook('your-file.xls', formatting_info=True)
first_sheet = book.sheet_by_index(0)
for row_idx in range(first_sheet.nrows):
text_cell = first_sheet.cell_value(row_idx, COL_IDX)
text_cell_xf = book.xf_list[first_sheet.cell_xf_index(row_idx, COL_IDX)]
# skip rows where cell is empty
if not text_cell:
continue
print text_cell,
text_cell_runlist = first_sheet.rich_text_runlist_map.get((row_idx, COL_IDX))
if text_cell_runlist:
print '(cell multi style) SEGMENTS:'
segments = []
for segment_idx in range(len(text_cell_runlist)):
start = text_cell_runlist[segment_idx][0]
# the last segment starts at given 'start' and ends at the end of the string
end = None
if segment_idx != len(text_cell_runlist) - 1:
end = text_cell_runlist[segment_idx + 1][0]
segment_text = text_cell[start:end]
segments.append({
'text': segment_text,
'font': book.font_list[text_cell_runlist[segment_idx][1]]
})
# segments did not start at beginning, assume cell starts with text styled as the cell
if text_cell_runlist[0][0] != 0:
segments.insert(0, {
'text': text_cell[:text_cell_runlist[0][0]],
'font': book.font_list[text_cell_xf.font_index]
})
for segment in segments:
print segment['text'],
print 'italic:', segment['font'].italic,
print 'bold:', segment['font'].bold
else:
print '(cell single style)',
print 'italic:', book.font_list[text_cell_xf.font_index].italic,
print 'bold:', book.font_list[text_cell_xf.font_index].bold
xlrd can do this. You must call load_workbook() with the kwarg formatting_info=True, then sheet objects will have an attribute rich_text_runlist_map which is a dictionary mapping cell coordinates ((row, col) tuples) to a runlist for that cell. A runlist is a sequence of (offset, font_index) pairs where offset tells you where in the cell the font begins, and font_index indexes into the workbook object's font_list attribute (the workbook object is what's returned by load_workbook()), which gives you a Font object describing the properties of the font, including bold, italics, typeface, size, etc.
I don't know if you can do that with xlrd, but since you ask about any other Python Excel module: openpyxl cannot do this in version 1.6.1.
The rich text gets reconstructed away in function get_string() in openpyxl/reader/strings.py. It would be relatively easy to setup a second table with 'raw' strings in that module.

How to add space between lines within a single paragraph with Reportlab

I have a block of text that is dynamically pulled from a database and is placed in a PDF before being served to a user. The text is being placed onto a lined background, much like notepad paper. I want to space the text so that only one line of text is between each background line.
I was able to use the following code to create a vertical spacing between paragraphs (used to generate another part of the PDF).
style = getSampleStyleSheet()['Normal']
style.fontName = 'Helvetica'
style.spaceAfter = 15
style.alignment = TA_JUSTIFY
story = [Paragraph(choice.value,style) for choice in chain(context['question1'].itervalues(),context['question2'].itervalues())]
generated_file = StringIO()
frame1 = Frame(50,100,245,240, showBoundary=0)
frame2 = Frame(320,100,245,240, showBoundary=0)
page_template = PageTemplate(frames=[frame1,frame2])
doc = BaseDocTemplate(generated_file,pageTemplates=[page_template])
doc.build(story)
However, this won't work here because I have only a single, large paragraph.
Pretty sure what yo u want to change is the leading. From the user manual in chapter 6.
To get double-spaced text, use a high
leading. If you set
autoLeading(default "off") to
"min"(use observed leading even if
smaller than specified) or "max"(use
the larger of observed and specified)
then an attempt is made to determine
the leading on a line by line basis.
This may be useful if the lines
contain different font sizes etc.
Leading is defined earlier in chapter 2:
Interline spacing (Leading)
The vertical offset between the point
at which one line starts and where the
next starts is called the leading
offset.
So try different values of leading, for example:
style = getSampleStyleSheet()['Normal']
style.leading = 24
Add leading to ParagraphStyle
orden = ParagraphStyle('orden')
orden.leading = 14
orden.borderPadding = 10
orden.backColor=colors.gray
orden.fontSize = 14
Generate PDF
buffer = BytesIO()
p = canvas.Canvas(buffer, pagesize=letter)
text = Paragraph("TEXT Nro 0001", orden)
text.wrapOn(p,500,10)
text.drawOn(p, 45, 200)
p.showPage()
p.save()
pdf = buffer.getvalue()
buffer.close()
The result

Categories