I'd like to generate an odf file with odfpy, and am stuck on underlining text.
Here is a minimal example inspired from official documentation, where i can't find any information about what attributes can be used and where.
Any suggestion?
from odf.opendocument import OpenDocumentText
from odf.style import Style, TextProperties
from odf.text import H, P, Span
textdoc = OpenDocumentText()
ustyle = Style(name="Underline", family="text")
#uprop = TextProperties(fontweight="bold") #uncommented, this works well
#uprop = TextProperties(attributes={"fontsize":"26pt"}) #this either
uprop = TextProperties(attributes={"underline":"solid"}) # bad guess, wont work !!
p = P(text="Hello world. ")
underlinedpart = Span(stylename=ustyle, text="This part would like to be underlined. ")
p.addText("This is after the style test.")
Here is how I finally got it:
I created a sample document with underlining using libreoffice, and unzipped it. Looking in styles.xml part of the extracted files, I got the part that makes underlining in the document:
<style:style style:name="Internet_20_link" style:display-name="Internet link" style:family="text">
<style:text-properties fo:color="#000080" fo:language="zxx" fo:country="none" style:text-underline-style="solid" style:text-underline-width="auto" style:text-underline-color="font-color" style:language-asian="zxx" style:country-asian="none" style:language-complex="zxx" style:country-complex="none"/>
The interesting style attributes are named: text-underline-style,
text-underline-width and text-underline-color.
To use them in odfpy, '-' characters must be removed, and attributes keys must be used as str (with quotes) like in the following code. A correct style family (text in our case) must be specified in the Style constructor call.
from odf.opendocument import OpenDocumentText
from odf.style import Style, TextProperties
from odf.text import H, P, Span
textdoc = OpenDocumentText()
#underline style
ustyle = Style(name="Underline", family="text") #here style family
uprop = TextProperties(attributes={
p = P(text="Hello world. ")
underlinedpart = Span(stylename=ustyle, text="This part would like to be underlined. ")
p.addText("This is after the style test.")
I want to get the font name (the title present in the description of the font file) in python.
I looked at the fonttools module but could not find any way to extract the title using it.
How can I do this?
Here is how you could do it with fonttools:
from fontTools import ttLib
font = ttLib.TTFont(fontPath)
fontFamilyName = font['name'].getDebugName(1)
fullName= font['name'].getDebugName(4)
The number 1, 4 are nameID. If you need anything more, read this documentation about nameID: https://learn.microsoft.com/en-us/typography/opentype/spec/name#name-ids
Here is fonttools documentation about the naming table: https://fonttools.readthedocs.io/en/latest/ttLib/tables/_n_a_m_e.html
If you need a more robust method to get an name from the naming table, you can use this logic:
import sys
from fontTools import ttLib
from fontTools.ttLib.tables._n_a_m_e import NameRecord
from typing import List
def sortNamingTable(names: List[NameRecord]) -> List[NameRecord]:
names (List[NameRecord]): Naming table
The sorted naming table.
Based on FontConfig:
- https://gitlab.freedesktop.org/fontconfig/fontconfig/-/blob/d863f6778915f7dd224c98c814247ec292904e30/src/fcfreetype.c#L1127-1140
def isEnglish(name: NameRecord) -> bool:
# From: https://gitlab.freedesktop.org/fontconfig/fontconfig/-/blob/d863f6778915f7dd224c98c814247ec292904e30/src/fcfreetype.c#L1111-1125
return (name.platformID, name.langID) in ((1, 0), (3, 0x409))
# From: https://github.com/freetype/freetype/blob/b98dd169a1823485e35b3007ce707a6712dcd525/include/freetype/ttnameid.h#L86-L91
# From: https://gitlab.freedesktop.org/fontconfig/fontconfig/-/blob/d863f6778915f7dd224c98c814247ec292904e30/src/fcfreetype.c#L1078
return sorted(names, key=lambda name: (PLATFORM_ID_ORDER.index(name.platformID), name.platEncID, -isEnglish(name), name.langID))
def get_font_names(font: ttLib.TTFont, nameID: int) -> List[NameRecord]:
font (ttLib.TTFont): Font
nameID (int): An ID from the naming table. See: https://learn.microsoft.com/en-us/typography/opentype/spec/name#name-ids
A list of each name that match the nameID code.
You may want to only use the first item of this list.
names = sortNamingTable(font['name'].names)
return list(filter(lambda name: name.nameID == nameID, names))
def main():
font_path = r"FONT_PATH"
font = ttLib.TTFont(font_path)
print(get_font_names(font, 1))
if __name__ == "__main__":
You can use external tools like otfinfo to extract font meta.
otfinfo reports information about the named OpenType font files.
$ otfinfo --info raleway.ttf
Family: Raleway Light
Subfamily: Regular
Full name: Raleway Light
PostScript name: Raleway-Light
Preferred family: Raleway
Preferred subfamily: Light
You can call it using subprocess in python and filter desired result using regular expression.
import subprocess
import re
font_file = "/home/user/raleway.ttf"
command = "otfinfo"
params = ["--info"]
result = subprocess.run([command, *params, font_file], stdout=subprocess.PIPE).stdout
font_name_re = re.compile(r"Full name:\s*(.*)")
font_name = font_name_re.findall(result.decode())
Output: Raleway Light
I've got a problem with updating table of contents in docx-file, generated by python-docx on Linux. Generally, it is not difficult to create TOC (Thanks for this answer https://stackoverflow.com/a/48622274/9472173 and this thread https://github.com/python-openxml/python-docx/issues/36)
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'TOC \o "1-3" \h \z \u' # change 1-3 depending on heading levels you need
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
r_element = run._r
p_element = paragraph._p
But later to make TOC visible it requires to update fields. Mentioned bellow solution involves update it manually (right-click on TOC hint and choose 'update fields'). For the automatic updating, I've found the following solution with word application simulation (thanks to this answer https://stackoverflow.com/a/34818909/9472173)
import win32com.client
import inspect, os
def update_toc(docx_file):
word = win32com.client.DispatchEx("Word.Application")
doc = word.Documents.Open(docx_file)
def main():
script_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
file_name = 'doc_with_toc.docx'
file_path = os.path.join(script_dir, file_name)
if __name__ == "__main__":
It pretty works on Windows, but obviously not on Linux. Have someone any ideas about how to provide the same functionality on Linux. The only one suggestion I have is to use local URLs (anchors) to every heading, but I am not sure is it possible with python-docx, also I'm not very strong with these openxml features. I will very appreciate any help.
I found a solution from this Github Issue. It work on ubuntu.
def set_updatefields_true(docx_path):
namespace = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
doc = Document(docx_path)
# add child to doc.settings element
element_updatefields = lxml.etree.SubElement(
doc.settings.element, f"{namespace}updateFields"
element_updatefields.set(f"{namespace}val", "true")
doc.save(docx_path)## Heading ##
import docx.oxml.ns as ns
def update_table_of_contents(doc):
# Find the settings element in the document
settings_element = doc.settings.element
# Create an "updateFields" element and set its "val" attribute to "true"
update_fields_element = docx.oxml.shared.OxmlElement('w:updateFields')
update_fields_element.set(ns.qn('w:val'), 'true')
# Add the "updateFields" element to the settings element
I am using python 2.7 with docx and I would like to change the background and text color of cells in my table based on condition.
I could not find any usefull resources about single cell formatting
Any suggestions?
Edit 1
my code
style_footer = "DarkList"
style_red = "ColorfulList"
style_yellow = "LightShading"
style_green = "MediumShading2-Accent6"
style_transperent = "TableNormal"
for a,rec in enumerate(data):
#V headinh se piše prvo polje iz table heada
document.add_heading(rec['tableHead'][0][0], level=1)
image_path = imageFolder + "\\" + slike[a]
document.add_picture(image_path, height=Inches(3.5))
#y += 28
#worksheet.insert_image( y, 1,imageFolder + "/" + slike[a])
for i, head in enumerate(rec['tableHead']):
table = document.add_table(rows=1, cols = len(head))
hdr_cells = table.rows[0].cells
for a in range(0,len(head)):
hdr_cells[a].text = head[a]
for a,body in enumerate(rec['tableData']):
row_cells = table.add_row().cells
for a in range(0,len(body)):
if body[a]['style'] == 'footer':
stil = style_footer
elif body[a]['style'] == 'red':
stil = style_red
elif body[a]['style'] == 'yellow':
stil = style_yellow
elif body[a]['style'] == 'green':
stil = style_green
stil = style_transperent
row_cells[a].add_paragraph(body[a]['value'], stil)
All cells are still the same.
If you want to color fill a specific cell in a table you can use the code below.
For example let's say you need to fill the first cell in the first row of your table with the RGB color 1F5C8B:
from docx.oxml.ns import nsdecls
from docx.oxml import parse_xml
shading_elm_1 = parse_xml(r'<w:shd {} w:fill="1F5C8B"/>'.format(nsdecls('w')))
Now if you want to also fill the second cell in the first row with the same color, you should create a new element
otherwise if you use the same element as above the fill will move on and will disappear from the first cell...
shading_elm_2 = parse_xml(r'<w:shd {} w:fill="1F5C8B"/>'.format(nsdecls('w')))
...and so on for other cells.
Source: https://groups.google.com/forum/#!topic/python-docx/-c3OrRHA3qo
With Nikos Tavoularis' solution, we have to create a new element for every cell.
I have created a function that achieves this. Works in Python revision 3.5.6 and python-docx revision 0.8.10
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
def set_table_header_bg_color(table.rows[row_ix].cell):
set background shading for Header Rows
tblCell = cell._tc
tblCellProperties = tblCell.get_or_add_tcPr()
clShading = OxmlElement('w:shd')
clShading.set(qn('w:fill'), "00519E") #Hex of Dark Blue Shade {R:0x00, G:0x51, B:0x9E}
return cell
End of set_table_header_bg_color Function
# main function
1. Load Document
2. Access the required section
3. Load the required Table
4. Traverse to the cell by accessing the rows object
for each_row in table.rows :
for each_cell in each_row.cells:
if each_cell.value satisfies a condition:
5. Continue execution
What we found is that, if you do cell.add_paragraph('sometext', style_object), it will keep the existing empty paragraph and add an additional paragraph with the style, which is not ideal.
What you will want to do is something like:
# replace the entire content of cell with new text paragraph
cell.text = 'some text'
# assign new style to the first paragraph
cell.paragraphs[0].style = style_object
Note that the style is applied to the paragraph not the cell, which isn't ideal for background colors (since it won't fill the enter cell if you have some padding. I haven't found a way around that (except in the case where you want EVERY cell to have a background color, you can apply a style to table.style).
Also, make sure that your styles are defined. You can check
styles = documents.styles
for s in styles:
print s.name
to see all the styles you have. You can define new styles and also load a template document with pre-defined styles already.
It looks like instead of using the cell.text = "Something" method you need to use the cell.add_paragraph("SomeText", a_style) with a defined style - probably one of:
Full list here.
If you use the “default” template document - otherwise you will have to create your own.
Taking from Nikos Tavoularis answer I would just change the shading_elm_1 declaration, as if you include the cell color in a loop for instance things might get messy.
As such, my suggestion would be:
from docx.oxml.ns import nsdecls
from docx.oxml import parse_xml
table.rows[0].cells[0]._tc.get_or_add_tcPr().append(parse_xml(r'<w:shd {} w:fill="1F5C8B"/>'.format(nsdecls('w'))))
I made a video demonstrating a way to do it here I took inspiration from the people above but I still had issues so I made this too help others.
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
document = Document("youfile.docx")
Table = document.tables[0]
cell_xml_element = Table.rows[1].cells[0]._tc
table_cell_properties = cell_xml_element.get_or_add_tcPr()
shade_obj = OxmlElement('w:shd')
shade_obj.set(qn('w:fill'), "ff00ff")
The code above will change the first cellof the second row of the first table in the document.Example of the output.
If you want to loop through the cells in a row use:
def color_row(row=0):
'make row of cells background colored, defaults to column header row'
row = t.rows[row]
for cell in row.cells:
shading_elm_2 = parse_xml(r'<w:shd {} w:fill="1F5C8B"/>'.format(nsdecls('w')))
run the function to color cells in the second row
If you want to change the text color too, you can set it on the runs within the cell. I wrote this function to handle the cell background and text colors together (using Nikos' method for the fill):
def shade_cell(cell, fill=None, color=None):
if fill:
shading_elm = parse_xml(r'<w:shd {} w:fill="{}"/>'.format(nsdecls('w'), fill))
if color:
for p in cell.paragraphs:
for r in p.runs:
r.font.color.rgb = RGBColor.from_string(color)
I originally tried to expand Nikos' solution by adding w:color="XXXXXX" to the w:shd tag but that didn't work for me. However setting the font color on each run got the result I wanted.
I have compiled the previous answers and added some features.
Feel free to test: Create new file run the "main" part at the bottom.
""" adder for python-docx in order to change text style in tables:
font color, italic, bold
cell background color
based on answers on
import docx # import python-docx (in order to create .docx report file)
from docx.oxml.ns import nsdecls
from docx.oxml import parse_xml
def change_table_cell(cell, background_color=None, font_color=None, font_size=None, bold=None, italic=None):
""" changes the background_color or font_color or font style (bold, italic) of this cell.
Leave the params as 'None' if you do not want to change them.
cell: the cell to manipulate
background_color: name for the color, e.g. "red" or "ff0000"
font_size: size in pt (e.g. 10)
bold: requested font style. True or False, or None if it shall remain unchanged
italic: requested font style. True or False, or None if it shall remain unchanged
background_color: the color of cells background"""
if background_color:
shading_elm = parse_xml(r'<w:shd {} w:fill="{}"/>'.format(nsdecls('w'), background_color))
if font_color:
for p in cell.paragraphs:
for r in p.runs:
r.font.color.rgb = docx.shared.RGBColor.from_string(font_color)
if font_size:
for p in cell.paragraphs:
for r in p.runs:
r.font.size = docx.shared.Pt(font_size)
if bold is not None:
for p in cell.paragraphs:
for r in p.runs:
r.bold = bold
if italic is not None:
for p in cell.paragraphs:
for r in p.runs:
r.italic = italic
def change_table_row(table_row, background_color=None, font_color=None, font_size=None, bold=None, italic=None):
for cell in table_row.cells:
change_table_cell(cell, background_color=background_color, font_color=font_color, font_size=font_size,
if __name__ == "__main__": # do the following code only if we run the file itself
#document = docx.Document('template.docx') # create an instance of a word document, use the style that we have defined in 'template.docx'
document = docx.Document()
num_rows = 4
num_cols = 3
table = document.add_table(rows=num_rows, cols=num_cols) # create empty table
#table.style = document.styles['MyTableStyleBlue'] # test overwriting the predefined style
# fill table
for row in range(num_rows):
for col in range(num_cols):
table.rows[row].cells[col].text = f'row/col=({row},{col})'
""" change color (see https://stackoverflow.com/questions/26752856/python-docx-set-table-cell-background-and-text-color) """
# Nikos Tavoularis answered Apr 18, 2017 at 8:38
shading_elm_1 = parse_xml(r'<w:shd {} w:fill="1F5C8B"/>'.format(nsdecls('w')))
# test new function derived from dyoung's answere of May 25 at 7:34, 2022
change_table_cell(table.rows[0].cells[0], background_color=None, font_color="ff0000", bold=False)
change_table_cell(table.rows[1].cells[2], background_color="00ff00", font_color="ff0000", font_size=20, bold=True)
change_table_row(table.rows[3], background_color="lightgreen", font_color="0000ff", italic=True) # https://www.delftstack.com/howto/python/colors-in-python/
Since yesterday I'm trying to extract the text from some highlighted annotations in one pdf, using python-poppler-qt4.
According to this documentation, looks like I have to get the text using the Page.text() method, passing a Rectangle argument from the higlighted annotation, which I get using Annotation.boundary(). But I get only blank text. Can someone help me? I copied my code below and added a link for the PDF I am using. Thanks for any help!
import popplerqt4
import sys
import PyQt4
def main():
doc = popplerqt4.Poppler.Document.load(sys.argv[1])
total_annotations = 0
for i in range(doc.numPages()):
page = doc.page(i)
annotations = page.annotations()
if len(annotations) > 0:
for annotation in annotations:
if isinstance(annotation, popplerqt4.Poppler.Annotation):
total_annotations += 1
if(isinstance(annotation, popplerqt4.Poppler.HighlightAnnotation)):
print str(page.text(annotation.boundary()))
if total_annotations > 0:
print str(total_annotations) + " annotation(s) found"
print "no annotations found"
if __name__ == "__main__":
Test pdf:
Looking at the documentation for Annotations it seems that the boundary property Returns this annotation's boundary rectangle in normalized coordinates. Although this seems a strange decision we can simply scale the coordinates by the page.pageSize().width() and .height() values.
import popplerqt4
import sys
import PyQt4
def main():
doc = popplerqt4.Poppler.Document.load(sys.argv[1])
total_annotations = 0
for i in range(doc.numPages()):
#print("========= PAGE {} =========".format(i+1))
page = doc.page(i)
annotations = page.annotations()
(pwidth, pheight) = (page.pageSize().width(), page.pageSize().height())
if len(annotations) > 0:
for annotation in annotations:
if isinstance(annotation, popplerqt4.Poppler.Annotation):
total_annotations += 1
if(isinstance(annotation, popplerqt4.Poppler.HighlightAnnotation)):
quads = annotation.highlightQuads()
txt = ""
for quad in quads:
rect = (quad.points[0].x() * pwidth,
quad.points[0].y() * pheight,
quad.points[2].x() * pwidth,
quad.points[2].y() * pheight)
bdy = PyQt4.QtCore.QRectF()
txt = txt + unicode(page.text(bdy)) + ' '
#print("========= ANNOTATION =========")
if total_annotations > 0:
print str(total_annotations) + " annotation(s) found"
print "no annotations found"
if __name__ == "__main__":
Additionally, I decided to concatenate the .highlightQuads() to get a better representation of what was actually highlighted.
Please be aware of the explicit <space> I have appended to each quad region of text.
In the example document the returned QString could not be passed directly to print() or str(), the solution to this was to use unicode() instead.
I hope this helps someone as it helped me.
Note: Page rotation may affect the scaling values, I have not been able to test this.
I'm trying to add a simple "page x of y" to a report made with ReportLab.. I found this old post about it, but maybe six years later something more straightforward has emerged? ^^;
I found this recipe too, but when I use it, the resulting PDF is missing the images..
I was able to implement the NumberedCanvas approach from ActiveState. It was very easy to do and did not change much of my existing code. All I had to do was add that NumberedCanvas class and add the canvasmaker attribute when building my doc. I also changed the measurements of where the "x of y" was displayed:
self.doc.build(pdf, canvasmaker=NumberedCanvas)
doc is a BaseDocTemplate and pdf is my list of flowable elements.
use doc.multiBuild
and in the page header method (defined by "onLaterPages="):
if doc.page > TOTALPAGES:
TOTALPAGES = doc.page
canvas.drawString(270 * mm, 5 * mm, "Seite %d/%d" % (doc.page,TOTALPAGES))
Just digging up some code for you, we use this:
Now self._on_page is a method that gets called for each page like:
def _on_page(self, canvas, doc):
# ... do any additional page formatting here for each page
print doc.page
I came up with a solution for platypus, that is easier to understand (at least I think it is). You can manually do two builds. In the first build, you can store the total number of pages. In the second build, you already know it in advance. I think it is easier to use and understand, because it works with platypus level event handlers, instead of canvas level events.
import copy
import io
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
styles = getSampleStyleSheet()
Title = "Hello world"
pageinfo = "platypus example"
total_pages = 0
def on_page(canvas, doc: SimpleDocTemplate):
global total_pages
total_pages = max(total_pages, doc.page)
canvas.setFont('Times-Roman', 9)
canvas.drawString(inch, 0.75 * inch, "Page %d %s" % (doc.page, total_pages))
Story = [Spacer(1, 2 * inch)]
style = styles["Normal"]
for i in range(100):
bogustext = ("This is Paragraph number %s. " % i) * 20
p = Paragraph(bogustext, style)
Story.append(Spacer(1, 0.2 * inch))
# You MUST use a deep copy of the story!
# https://mail.python.org/pipermail/python-list/2022-March/905728.html
# First pass
with io.BytesIO() as out:
doc = SimpleDocTemplate(out)
doc.build(copy.deepcopy(Story), onFirstPage=on_page, onLaterPages=on_page)
# Second pass
with open("test.pdf", "wb+") as out:
doc = SimpleDocTemplate(out)
doc.build(copy.deepcopy(Story), onFirstPage=on_page, onLaterPages=on_page)
You just need to make sure that you always render a deep copy of your original story. Otherwise it won't work. (You will either get an empty page as the output, or a render error telling that a Flowable doesn't fit in the frame.)