Python docx not finding all sections

Python docx not finding all sections - python

I need this code to be able to work with existing documents I have, but it is only finding one footer in each document. I need it to find all 3 (first page, even, and odd).
Here is the link to my test document: http://www.filedropper.com/testdoc_1
from docx import Document
document = Document('C:/Users/username/Desktop/SPECS/TEST DOC.docx')
print(len(document.sections))
for section in document.sections:
footer = section.footer
footer.paragraphs[0].text = footer.paragraphs[0].text.replace("ISSUED FOR CONSTRUCTION", "DESIGN DEVELOPMENT")
document.save('C:/Users/username/Desktop/SPECS/TEST DOC.docx')

So I answered my question by reading this documentation: https://python-docx.readthedocs.io/en/latest/api/section.html
The default is the odd page footer. To access the other footers (if you enabled different first page and even footers) you have to use the below code:
first_page_footer = section.first_page_footer
even_page_footer = section.even_page_footer
odd_page_footer = section.footer

Related

Copy specific word section to new document in python

As the title says, I'm trying to copy a select section of text to a new word document. Basically, I have a bunch of annual reports with sections that are systemically named (i.e. Project 1, Project 2, etc..) I want to be able to search for a select section and copy that section into a report for the individual project. I've been looking through docx documentation and aspose.words documentation. This is the closest I've gotten to what I'm looking for but it's still not quite right:
# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document(docs_base.my_dir + "Big document.docx")
for i in range(0, doc.sections.count) :
# Split a document into smaller parts, in this instance, split by section.
section = doc.sections[i].clone()
newDoc = aw.Document()
newDoc.sections.clear()
newSection = newDoc.import_node(section, True).as_section()
newDoc.sections.add(newSection)
# Save each section as a separate document.
newDoc.save(docs_base.artifacts_dir + f"SplitDocument.by_sections_{i}.docx")

I suspect by saying section you mean just content tat starts with some heading paragraph. This does not correspond to Section node in Aspose.Words Document Object Model. Section in MS Word are used if you need to have different page setup or headers/footers within the document.
It is difficult to say without your document, but I suppose there is only one section in your document. It looks like you need to extract content form the document based on the styles.
Also, I have create a simple code example that demonstrates the basic technique:
import aspose.words as aw
# Geneare document with heading paragraphs (just for demonstrations purposes).
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
for i in range(0, 5):
builder.paragraph_format.style_identifier = aw.StyleIdentifier.HEADING1
builder.writeln("Project {0}".format(i))
builder.paragraph_format.style_identifier = aw.StyleIdentifier.NORMAL
for j in range(0, 10):
builder.writeln("This is the project {0} content {1}, each project section will be extracted into a separate document.".format(i, j))
doc.save("C:\\Temp\\out.docx")
# Now split the document by heading paragraphs.
# Code is just for demonstration purposes and supposes there is only one section in the document.
subDocumentIndex = 0
subDocument = doc.clone(False).as_document()
for child in doc.first_section.body.child_nodes:
if child.node_type == aw.NodeType.PARAGRAPH and child.as_paragraph().paragraph_format.style_identifier == aw.StyleIdentifier.HEADING1:
if not subDocument.has_child_nodes:
subDocument.ensure_minimum()
else:
# save subdocument
subDocument.save("C:\\Temp\\sub_document_{0}.docx".format(subDocumentIndex))
subDocumentIndex = subDocumentIndex+1
subDocument = doc.clone(False).as_document()
subDocument.ensure_minimum()
# Remove body content.
subDocument.first_section.body.remove_all_children()
# import current node to the subdocument.
dst_child = subDocument.import_node(child, True, aw.ImportFormatMode.USE_DESTINATION_STYLES)
subDocument.first_section.body.append_child(dst_child)
# save the last document.
subDocument.save("C:\\Temp\\sub_document_{0}.docx".format(subDocumentIndex))

Adding page numbers in header or footer using versions 0.8.8+

Recently, version 0.8.8 of python-docx added direct support for headers and footers.
Now one can simply add a header or footer as follows:
from docx import Document
document = Document()
header = document.sections[0].header
header.add_paragraph('This is an example Header')
footer = document.sections[0].footer
footer.add_paragraph('This is an example Footer')
Prior to this release, one could add headers and footers flexibly using a template approach.
With templates it is very straightforward to include things such as page numbers. However, this does not seem to be the case with the implementation of headers and footers in the new versions.
Is there an easy way to add page numbers in versions 0.8.8 and later?

I believe you'll find that an "auto" page-number of the sort used in a header or footer is a type of field. Fields have not yet been implemented in python-docx, so you'd be on your own for that, having to add the required XML from a point as close as you can get, which I expect would be the <w:r> element for a run.
The way I would approach it would be to add a page-number in a header using Word, and then inspect the resulting XML using opc-diag. That would establish specifically what XML needs to go where.
From there you can get a run element using r = run._r and then use lxml calls to insert the XML you need.

How can I find and replace a text in footer of word document using "Python docx"

enter image description hereI have created a word document and added a footer in that document using python docx. But now I want to replace text in the footer.
For example: I have a paragraph in footer i.e "Page 1 to 10". I want to find word "to" and replace it with "of" so my result will be "Page 1 of 10".
My code:
from docx import Document
document = Document()
document.add_heading('Document Title', 0)
section = document.sections[0]
footer = section.footer
footer.add_paragraph("Page 1 to 10")
document.save('mydoc.docx')

Try this solution.
You would have to iterate over all the footers in document
from docx import Document
document = Document('mydoc.docx')
for section in document.sections:
footer = section.footer
footer.paragraphs[1].text = footer.paragraphs[1].text.replace("to", "of")
document.save('mydoc.docx')
The reason why this code is editing the second element of the footer paragraphs list is because you have added another paragraph to the footer in your code. By default there is an empty paragraph already in the footer according to the documentation

I know this is an old question however the accepted answer changes the page number of the footer and requires editing of the XML if you want to fix that issue. Instead I have found that if you edit the text using Runs, it will keep the original format for your header/footer. https://python-docx.readthedocs.io/en/latest/api/text.html#docx.text.run.Run
from docx import Document
document = Document('mydoc.docx')
for section in document.sections:
footer = section.footer
footer.paragraphs[1].runs[0].text = footer.paragraphs[1].runs[0].text.replace("to", "of")
document.save('mydoc.docx')

Adding hyperlinks in Microsoft word using python

I've been searching for the last few hours and cannot find a library that allows me to add hyperlinks to a word document using python. In my ideal world I'd be able to manipulate a word doc using python to add hyperlinks to footnotes which link to internal documents. Python-docx doesn't seem to have this feature.
It breaks down into 2 questions. 1) Is there a way to add hyperlinks to word docs using python? 2) Is there a way to manipulate footnotes in word docs using python?
Does anyone know how to do this or any part of this?

Hyperlinks can be added using the win32com package:
import win32com.client
#connect to Word (start it if it isn't already running)
wordapp = win32com.client.Dispatch("Word.Application")
#add a new document
doc = wordapp.Documents.Add()
#add some text and turn it into a hyperlink
para = doc.Paragraphs.Add()
para.Range.Text = "Adding hyperlinks in Microsoft word using python"
doc.Hyperlinks.Add(Anchor=para.Range, Address="http://stackoverflow.com/questions/34636391/adding-hyperlinks-in-microsoft-word-using-python")
#In theory you should be able to also pass in a TextToDisplay argument to the above call but I haven't been able to get this to work
#The workaround is to insert the link text into the document first and then convert it into a hyperlink

# How to insert hyperlinks into an existing MS Word document using win32com:
# Use the same call as in the example above to connect to Word:
wordapp = win32com.client.Dispatch("Word.Application")
# Open the input file where you want to insert the hyperlinks:
wordapp.Documents.Open("my_input_file.docx")
# Select the currently active document
doc = wordapp.ActiveDocument
# For my application, I want to replace references to identifiers in another
# document with the general format of "MSS-XXXX", where X is any digit, with
# hyperlinks to local html pages that capture the supporting details...
# First capture the entire document's content as text
docText = doc.Content.text
# Search for all identifiers that match the format criteria in the document:
mss_ids_to_link = re.findall('MSS-[0-9]+', docText)
# Now loop over all the identifier strings that were found, construct the link
# address for each html page to be linked, select the desired text where I want
# to insert the hyperlink, and then apply the link to the correct range of
# characters:
for linkIndex in range(len(mss_ids_to_link)):
current_string_to_link = mss_ids_to_link[linkIndex]
link_address = html_file_pathname + \
current_string_to_link + '.htm'
if wordapp.Selection.Find.Execute(FindText=current_string_to_link, \
Address=link_address) == True:
doc.Hyperlinks.Add(Anchor=wordapp.Selection.Range, \
Address=link_address)
# Save off the result:
doc.SaveAs('my_input_file.docx')

Add an image in a specific position in the document (.docx)?

I use Python-docx to generate Microsoft Word document.The user want that when he write for eg: "Good Morning every body,This is my %(profile_img)s do you like it?"
in a HTML field, i create a word document and i recuper the picture of the user from the database and i replace the key word %(profile_img)s by the picture of the user NOT at the END OF THE DOCUMENT. With Python-docx we use this instruction to add a picture:
document.add_picture('profile_img.png', width=Inches(1.25))
The picture is added to the document but the problem that it is added at the end of the document.
Is it impossible to add a picture in a specific position in a microsoft word document with python? I've not found any answers to this in the net but have seen people asking the same elsewhere with no solution.
Thanks (note: I'm not a hugely experiance programmer and other than this awkward part the rest of my code will very basic)

Quoting the python-docx documentation:
The Document.add_picture() method adds a specified picture to the end of the document in a paragraph of its own. However, by digging a little deeper into the API you can place text on either side of the picture in its paragraph, or both.
When we "dig a little deeper", we discover the Run.add_picture() API.
Here is an example of its use:
from docx import Document
from docx.shared import Inches
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
r.add_picture('/tmp/foo.jpg')
r.add_text(' do you like it?')
document.save('demo.docx')

well, I don't know if this will apply to you but here is what I've done to set an image in a specific spot to a docx document:
I created a base docx document (template document). In this file, I've inserted some tables without borders, to be used as placeholders for images. When creating the document, first I open the template, and update the file creating the images inside the tables. So the code itself is not much different from your original code, the only difference is that I'm creating the paragraph and image inside a specific table.
from docx import Document
from docx.shared import Inches
doc = Document('addImage.docx')
tables = doc.tables
p = tables[0].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('resized.png',width=Inches(4.0), height=Inches(.7))
p = tables[1].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('teste.png',width=Inches(4.0), height=Inches(.7))
doc.save('addImage.docx')

Here's my solution. It has the advantage on the first proposition that it surrounds the picture with a title (with style Header 1) and a section for additional comments. Note that you have to do the insertions in the reverse order they appear in the Word document.
This snippet is particularly useful if you want to programmatically insert pictures in an existing document.
from docx import Document
from docx.shared import Inches
# ------- initial code -------
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
picPath = 'D:/Development/Python/aa.png'
r.add_picture(picPath)
r.add_text(' do you like it?')
document.save('demo.docx')
# ------- improved code -------
document = Document()
p = document.add_paragraph('Picture bullet section', 'List Bullet')
p = p.insert_paragraph_before('')
r = p.add_run()
r.add_picture(picPath)
p = p.insert_paragraph_before('My picture title', 'Heading 1')
document.save('demo_better.docx')

This is adopting the answer written by Robᵩ while considering more flexible input from user.
My assumption is that the HTML field mentioned by Kais Dkhili (orignal enquirer) is already loaded in docx.Document(). So...
Identify where is the related HTML text in the document.
import re
## regex module
img_tag = re.compile(r'%\(profile_img\)s') # declare pattern
for _p in enumerate(document.paragraphs):
if bool(img_tag.match(_p.text)):
img_paragraph = _p
# if and only if; suggesting img_paragraph a list and
# use append method instead for full document search
break # lose the break if want full document search
Replace desired image into placeholder identified as img_tag = '%(profile_img)s'
The following code is after considering the text contains only a single run
May be changed accordingly if condition otherwise
temp_text = img_tag.split(img_paragraph.text)
img_paragraph.runs[0].text = temp_text[0]
_r = img_paragraph.add_run()
_r.add_picture('profile_img.png', width = Inches(1.25))
img_paragraph.add_run(temp_text[1])
and done. document.save() it if finalised.
In case you are wondering what to expect from the temp_text...
[In]
img_tag.split(img_paragraph.text)
[Out]
['This is my ', ' do you like it?']

I spend few hours in it. If you need to add images to a template doc file using python, the best solution is to use python-docx-template library.
Documentation is available here
Examples available in here

This is variation on a theme. Letting I be the paragraph number in the specific document then:
p = doc.paragraphs[I].insert_paragraph_before('\n')
p.add_run().add_picture('Fig01.png', width=Cm(15))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python docx not finding all sections - python

Related

Copy specific word section to new document in python

Adding page numbers in header or footer using versions 0.8.8+

How can I find and replace a text in footer of word document using "Python docx"

Adding hyperlinks in Microsoft word using python

Add an image in a specific position in the document (.docx)?

Categories

Resources