There is a good example for Python Docx.
I have used multiple document.add_heading('xxx', level=Y) and can see when I open the generated document in MS Word that the levels are correct.
What I don't see is numbering, such a 1, 1.1, 1.1.1, etc I just see the heading text.
How can I display heading numbers, using Docx ?
Alphanumeric heading prefixes are automatically created based on the outline style and level of the heading. Set the outline style and insert the correct level and you will get the numbering.
From documentation:
_NumberingStyle objects class docx.styles.style._NumberingStyle[source] A numbering style. Not yet
implemented.
However, if you set the heading like this:
paragraph.style = document.styles['Heading 1']
then it should default to the latent numbering style of that heading.
There is a great work around with python docx for achieving complex operations like headings enumerations. Here how to proceed:
Create a new blank document in Word and define your complex individual multilevel list there .
Save the blank document as my_template.docx.
Create the document in your python script based on your template:
from docx import Document
document = Document("my_template.docx")
document.add_heading('My first numbered heading', level=1)
And, voila, you can generate your perfect document with numbered headings.
this answer will realy help you
first you need to new a without number header like this
paragraph = document.add_paragraph()
paragraph.style = document.styles['Heading 4']
then you will have xml word like this
<w:pPr>
<w:pStyle w:val="4"/>
</w:pPr>
then you can access xml word "pStyle" property and change it using under code
header._p.pPr.pStyle.set(qn('w:val'), u'4FDD')
final, open word file you will get what you want !!!
def __str__(self):
if self.nivel == 1:
return str(Level.count_1)+'.- '+self.titulo
elif self.nivel==2: #Imprime si es del nivel 2
return str(Level.count_1)+'.'+str(Level.count_2)+'.- '+self.titulo
elif self.nivel==3: #Imprime si es del nivel 3
return str(Level.count_1)+'.'+str(Level.count_2)+'.'+str(Level.count_3)+'.- '+self.titulo
Related
As the title says, I'm trying to copy a select section of text to a new word document. Basically, I have a bunch of annual reports with sections that are systemically named (i.e. Project 1, Project 2, etc..) I want to be able to search for a select section and copy that section into a report for the individual project. I've been looking through docx documentation and aspose.words documentation. This is the closest I've gotten to what I'm looking for but it's still not quite right:
# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document(docs_base.my_dir + "Big document.docx")
for i in range(0, doc.sections.count) :
# Split a document into smaller parts, in this instance, split by section.
section = doc.sections[i].clone()
newDoc = aw.Document()
newDoc.sections.clear()
newSection = newDoc.import_node(section, True).as_section()
newDoc.sections.add(newSection)
# Save each section as a separate document.
newDoc.save(docs_base.artifacts_dir + f"SplitDocument.by_sections_{i}.docx")
I suspect by saying section you mean just content tat starts with some heading paragraph. This does not correspond to Section node in Aspose.Words Document Object Model. Section in MS Word are used if you need to have different page setup or headers/footers within the document.
It is difficult to say without your document, but I suppose there is only one section in your document. It looks like you need to extract content form the document based on the styles.
Also, I have create a simple code example that demonstrates the basic technique:
import aspose.words as aw
# Geneare document with heading paragraphs (just for demonstrations purposes).
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
for i in range(0, 5):
builder.paragraph_format.style_identifier = aw.StyleIdentifier.HEADING1
builder.writeln("Project {0}".format(i))
builder.paragraph_format.style_identifier = aw.StyleIdentifier.NORMAL
for j in range(0, 10):
builder.writeln("This is the project {0} content {1}, each project section will be extracted into a separate document.".format(i, j))
doc.save("C:\\Temp\\out.docx")
# Now split the document by heading paragraphs.
# Code is just for demonstration purposes and supposes there is only one section in the document.
subDocumentIndex = 0
subDocument = doc.clone(False).as_document()
for child in doc.first_section.body.child_nodes:
if child.node_type == aw.NodeType.PARAGRAPH and child.as_paragraph().paragraph_format.style_identifier == aw.StyleIdentifier.HEADING1:
if not subDocument.has_child_nodes:
subDocument.ensure_minimum()
else:
# save subdocument
subDocument.save("C:\\Temp\\sub_document_{0}.docx".format(subDocumentIndex))
subDocumentIndex = subDocumentIndex+1
subDocument = doc.clone(False).as_document()
subDocument.ensure_minimum()
# Remove body content.
subDocument.first_section.body.remove_all_children()
# import current node to the subdocument.
dst_child = subDocument.import_node(child, True, aw.ImportFormatMode.USE_DESTINATION_STYLES)
subDocument.first_section.body.append_child(dst_child)
# save the last document.
subDocument.save("C:\\Temp\\sub_document_{0}.docx".format(subDocumentIndex))
I have been using python docx library and oxml to automate some changes to my tables in my word document. Unfortunately, no matter what I do, I cannot wrap the text in the table cells.
I managed to successfully manipulate 'autofit' and 'fit-text' properties of my table, but non of them contribute to the wrapping of the text in the cells. I can see that there is a "w:noWrap" in the xml version of my word document and no matter what I do I cannot manipulate and remove it. I believe it is responsible for the word wrapping in my table.
for example in this case I am adding a table. I can fit text in cell and set autofit to 'true' but cannot for life of me wrap the text:
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
doc = Document()
table = doc.add_table(5,5)
table.autofit = True # Does Autofit but not wrapping
tc = table.cell(0,0)._tc # As a test, fit text to cell 0,0
tcPr = tc.get_or_add_tcPr()
tcFitText = OxmlElement('w:tcFitText')
tcFitText.set(qn('w:val'),"true")
tcPr.append(tcFitText) #Does fitting but no wrapping
doc.save('demo.docx')
I would appreciate any help or hints.
The <w:noWrap> element appears to be a child of <w:tcPr>, the element that controls table cell properties.
You should be able to access it from the table cell element using XPath:
tc = table.cell(0, 0)._tc
noWraps = tc.xpath(".//w:noWrap")
The noWraps variable here will then be a list containing zero or more <w:noWrap> elements, in your case probably one.
Deleting it is probably the simplest approach, which you can accomplish like this:
if noWraps: # ---skip following code if list is empty---
noWrap = noWraps[0]
noWrap.getparent().remove(noWrap)
You can also take the approach of setting the value of the w:val attribute of the w:noWrap element, but then you have to get into specifying the Clark name of the attribute namespace, which adds some extra fuss and doesn't really produce a different outcome unless for some reason you want to keep that element around.
I am generating a doc using python docx module.
I want to bold the specific cell of a row in python docx
here is the code
book_title = '\n๐๐ข๐ญ๐ฅ๐-:\n {}\n\n'.format(book_title)
book_desc = '๐๐ฎ๐ญ๐ก๐จ๐ซ-: {}\n\n๐๐๐ฌ๐๐ซ๐ข๐ฉ๐ญ๐ข๐จ๐ง:\n{}\n\n๐๐๐ฅ๐๐ฌ ๐๐จ๐ข๐ง๐ญ๐ฌ:\n{}'.format(book.author,book_description,sales_point)
row1.cells[1].text = (book_title + book_desc)
I just want to bold the book_title.
If I apply a style it automatically applies to whole document.
A cell does not have a character style; character style can only be applied to text, and in particular to a run of text. This is in fact the defining characteristic of a run, being a sequence of characters that share the same character formatting, also known as font in python-docx.
To get the book title with a different font than the description, they need to appear in separate runs. Assigning to Cell.text (as you have) results in all the text being in a single run.
This might work for you, but assumes the cell is empty as you start:
paragraph = row1.cells[1].paragraphs[0]
title_run = paragraph.add_run(book_title)
description_run = paragraph.add_run(book_desc)
title_run.bold = True
This code can be made more compact:
paragraph = row1.cells[1].paragraphs[0]
paragraph.add_run(book_title).bold = True
paragraph.add_run(book_desc)
but perhaps the former version makes it more clear just what you're doing in each step.
Here is how I understand it:
Paragraph is holding the run objects and styles (bold, italic) are methods of run.
So following this logic here is what might solve your question:
row1_cells[0].paragraphs[0].add_run(book_title + book_desc).bold=True
This is just an example for the first cell of the table. Please amend it in your code.
Since you are using the docx module, you can style your text/paragraph by explicitly defining the style.
In order to apply a style, use the following code snippet referenced from docx documentation here.
>>> from docx import Document
>>> document = Document()
>>> style = document.styles['Normal']
>>> font = style.font
>>> font.bold= True
This will change the font style to bold for the applied paragraph.
In python-docx, the styling of any character in a docx template document can be overridden by the use of Rich Text styling. You should provide a context variable for the particular character/string that needs styling in your template, at the position of the character/string. This variable maps to the RichText object that has the style definition(that you define in your code), to style the character/string. To make things clearer, consider an example template doc "test.docx" that contains the following text:
Hello {{r context_var}}!
The {{..}} is the jinja2 tag syntax and {{r is the RichText tag that overrides the character styling. The context_var is a variable that maps the styling to your character string.
We accomplish Rich Text styling like this:
from docxtpl import DocxTemplate, RichText
doc = DocxTemplate("test.docx")
rt = RichText() #create a RichText object
rt.add('World', bold=True) #pass the text as an argument and the style, bold=True
context = { 'context_var': rt } #add context variable to the context and map it to rt
doc.render(context) #render the context
doc.save("generated_doc.docx") #save as a new document
Let's look at the contents of "generated_doc.docx":
Hello World!
I'm not sure how your template is designed, but if you just want the book_title as bold, your template "test.docx" should have text like:
Title:-
{{r book_title_var}}
The code should be modified to:
book_title = "Lord of the Rings" #or wherever you get the book title from
rt.add(book_title, bold=True)
context = { 'book_title_var': rt }
generated_doc.docx:
Title:-
Lord of the Rings
Is there any way to access and manipulate text in an existing docx document in a textbox with python-docx?
I tried to find a keyword in all paragraphs in a document by iteration:
doc = Document('test.docx')
for paragraph in doc.paragraphs:
if '<DATE>' in paragraph.text:
print('found date: ', paragraph.text)
It is found if placed in normal text, but not inside a textbox.
A workaround for textboxes that contain only formatted text is to use a floating, formatted table. It can be styled almost like a textbox (frames, colours, etc.) and is easily accessible by the docx API.
doc = Document('test.docx')
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
if '<DATE>' in paragraph.text:
print('found date: ', paragraph.text)
Not via the API, not yet at least. You'd have to uncover the XML structure it lives in and go down to the lxml level and perhaps XPath to find it. Something like this might be a start:
body = doc._body
# assuming differentiating container element is w:textBox
text_box_p_elements = body.xpath('.//w:textBox//w:p')
I have no idea whether textBox is the actual element name here, you'd have to sort that out with the rest of the XPath path details, but this approach will likely work. I use similar approaches frequently to work around features that aren't built into the API yet.
opc-diag is a useful tool for inspecting the XML. The basic approach is to create a minimally small .docx file containing the type of thing you're trying to locate. Then use opc-diag to inspect the XML Word generates when you save the file:
$ opc browse test.docx document.xml
http://opc-diag.readthedocs.org/en/latest/index.html
I use Python-docx to generate Microsoft Word document.The user want that when he write for eg: "Good Morning every body,This is my %(profile_img)s do you like it?"
in a HTML field, i create a word document and i recuper the picture of the user from the database and i replace the key word %(profile_img)s by the picture of the user NOT at the END OF THE DOCUMENT. With Python-docx we use this instruction to add a picture:
document.add_picture('profile_img.png', width=Inches(1.25))
The picture is added to the document but the problem that it is added at the end of the document.
Is it impossible to add a picture in a specific position in a microsoft word document with python? I've not found any answers to this in the net but have seen people asking the same elsewhere with no solution.
Thanks (note: I'm not a hugely experiance programmer and other than this awkward part the rest of my code will very basic)
Quoting the python-docx documentation:
The Document.add_picture() method adds a specified picture to the end of the document in a paragraph of its own. However, by digging a little deeper into the API you can place text on either side of the picture in its paragraph, or both.
When we "dig a little deeper", we discover the Run.add_picture() API.
Here is an example of its use:
from docx import Document
from docx.shared import Inches
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
r.add_picture('/tmp/foo.jpg')
r.add_text(' do you like it?')
document.save('demo.docx')
well, I don't know if this will apply to you but here is what I've done to set an image in a specific spot to a docx document:
I created a base docx document (template document). In this file, I've inserted some tables without borders, to be used as placeholders for images. When creating the document, first I open the template, and update the file creating the images inside the tables. So the code itself is not much different from your original code, the only difference is that I'm creating the paragraph and image inside a specific table.
from docx import Document
from docx.shared import Inches
doc = Document('addImage.docx')
tables = doc.tables
p = tables[0].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('resized.png',width=Inches(4.0), height=Inches(.7))
p = tables[1].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('teste.png',width=Inches(4.0), height=Inches(.7))
doc.save('addImage.docx')
Here's my solution. It has the advantage on the first proposition that it surrounds the picture with a title (with style Header 1) and a section for additional comments. Note that you have to do the insertions in the reverse order they appear in the Word document.
This snippet is particularly useful if you want to programmatically insert pictures in an existing document.
from docx import Document
from docx.shared import Inches
# ------- initial code -------
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
picPath = 'D:/Development/Python/aa.png'
r.add_picture(picPath)
r.add_text(' do you like it?')
document.save('demo.docx')
# ------- improved code -------
document = Document()
p = document.add_paragraph('Picture bullet section', 'List Bullet')
p = p.insert_paragraph_before('')
r = p.add_run()
r.add_picture(picPath)
p = p.insert_paragraph_before('My picture title', 'Heading 1')
document.save('demo_better.docx')
This is adopting the answer written by Robแตฉ while considering more flexible input from user.
My assumption is that the HTML field mentioned by Kais Dkhili (orignal enquirer) is already loaded in docx.Document(). So...
Identify where is the related HTML text in the document.
import re
## regex module
img_tag = re.compile(r'%\(profile_img\)s') # declare pattern
for _p in enumerate(document.paragraphs):
if bool(img_tag.match(_p.text)):
img_paragraph = _p
# if and only if; suggesting img_paragraph a list and
# use append method instead for full document search
break # lose the break if want full document search
Replace desired image into placeholder identified as img_tag = '%(profile_img)s'
The following code is after considering the text contains only a single run
May be changed accordingly if condition otherwise
temp_text = img_tag.split(img_paragraph.text)
img_paragraph.runs[0].text = temp_text[0]
_r = img_paragraph.add_run()
_r.add_picture('profile_img.png', width = Inches(1.25))
img_paragraph.add_run(temp_text[1])
and done. document.save() it if finalised.
In case you are wondering what to expect from the temp_text...
[In]
img_tag.split(img_paragraph.text)
[Out]
['This is my ', ' do you like it?']
I spend few hours in it. If you need to add images to a template doc file using python, the best solution is to use python-docx-template library.
Documentation is available here
Examples available in here
This is variation on a theme. Letting I be the paragraph number in the specific document then:
p = doc.paragraphs[I].insert_paragraph_before('\n')
p.add_run().add_picture('Fig01.png', width=Cm(15))