Add an image in a specific position in the document (.docx)? - python

I use Python-docx to generate Microsoft Word document.The user want that when he write for eg: "Good Morning every body,This is my %(profile_img)s do you like it?"
in a HTML field, i create a word document and i recuper the picture of the user from the database and i replace the key word %(profile_img)s by the picture of the user NOT at the END OF THE DOCUMENT. With Python-docx we use this instruction to add a picture:
document.add_picture('profile_img.png', width=Inches(1.25))
The picture is added to the document but the problem that it is added at the end of the document.
Is it impossible to add a picture in a specific position in a microsoft word document with python? I've not found any answers to this in the net but have seen people asking the same elsewhere with no solution.
Thanks (note: I'm not a hugely experiance programmer and other than this awkward part the rest of my code will very basic)

Quoting the python-docx documentation:
The Document.add_picture() method adds a specified picture to the end of the document in a paragraph of its own. However, by digging a little deeper into the API you can place text on either side of the picture in its paragraph, or both.
When we "dig a little deeper", we discover the Run.add_picture() API.
Here is an example of its use:
from docx import Document
from docx.shared import Inches
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
r.add_picture('/tmp/foo.jpg')
r.add_text(' do you like it?')
document.save('demo.docx')

well, I don't know if this will apply to you but here is what I've done to set an image in a specific spot to a docx document:
I created a base docx document (template document). In this file, I've inserted some tables without borders, to be used as placeholders for images. When creating the document, first I open the template, and update the file creating the images inside the tables. So the code itself is not much different from your original code, the only difference is that I'm creating the paragraph and image inside a specific table.
from docx import Document
from docx.shared import Inches
doc = Document('addImage.docx')
tables = doc.tables
p = tables[0].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('resized.png',width=Inches(4.0), height=Inches(.7))
p = tables[1].rows[0].cells[0].add_paragraph()
r = p.add_run()
r.add_picture('teste.png',width=Inches(4.0), height=Inches(.7))
doc.save('addImage.docx')

Here's my solution. It has the advantage on the first proposition that it surrounds the picture with a title (with style Header 1) and a section for additional comments. Note that you have to do the insertions in the reverse order they appear in the Word document.
This snippet is particularly useful if you want to programmatically insert pictures in an existing document.
from docx import Document
from docx.shared import Inches
# ------- initial code -------
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
picPath = 'D:/Development/Python/aa.png'
r.add_picture(picPath)
r.add_text(' do you like it?')
document.save('demo.docx')
# ------- improved code -------
document = Document()
p = document.add_paragraph('Picture bullet section', 'List Bullet')
p = p.insert_paragraph_before('')
r = p.add_run()
r.add_picture(picPath)
p = p.insert_paragraph_before('My picture title', 'Heading 1')
document.save('demo_better.docx')

This is adopting the answer written by Robᵩ while considering more flexible input from user.
My assumption is that the HTML field mentioned by Kais Dkhili (orignal enquirer) is already loaded in docx.Document(). So...
Identify where is the related HTML text in the document.
import re
## regex module
img_tag = re.compile(r'%\(profile_img\)s') # declare pattern
for _p in enumerate(document.paragraphs):
if bool(img_tag.match(_p.text)):
img_paragraph = _p
# if and only if; suggesting img_paragraph a list and
# use append method instead for full document search
break # lose the break if want full document search
Replace desired image into placeholder identified as img_tag = '%(profile_img)s'
The following code is after considering the text contains only a single run
May be changed accordingly if condition otherwise
temp_text = img_tag.split(img_paragraph.text)
img_paragraph.runs[0].text = temp_text[0]
_r = img_paragraph.add_run()
_r.add_picture('profile_img.png', width = Inches(1.25))
img_paragraph.add_run(temp_text[1])
and done. document.save() it if finalised.
In case you are wondering what to expect from the temp_text...
[In]
img_tag.split(img_paragraph.text)
[Out]
['This is my ', ' do you like it?']

I spend few hours in it. If you need to add images to a template doc file using python, the best solution is to use python-docx-template library.
Documentation is available here
Examples available in here

This is variation on a theme. Letting I be the paragraph number in the specific document then:
p = doc.paragraphs[I].insert_paragraph_before('\n')
p.add_run().add_picture('Fig01.png', width=Cm(15))

Related

Copy specific word section to new document in python

As the title says, I'm trying to copy a select section of text to a new word document. Basically, I have a bunch of annual reports with sections that are systemically named (i.e. Project 1, Project 2, etc..) I want to be able to search for a select section and copy that section into a report for the individual project. I've been looking through docx documentation and aspose.words documentation. This is the closest I've gotten to what I'm looking for but it's still not quite right:
# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document(docs_base.my_dir + "Big document.docx")
for i in range(0, doc.sections.count) :
# Split a document into smaller parts, in this instance, split by section.
section = doc.sections[i].clone()
newDoc = aw.Document()
newDoc.sections.clear()
newSection = newDoc.import_node(section, True).as_section()
newDoc.sections.add(newSection)
# Save each section as a separate document.
newDoc.save(docs_base.artifacts_dir + f"SplitDocument.by_sections_{i}.docx")
I suspect by saying section you mean just content tat starts with some heading paragraph. This does not correspond to Section node in Aspose.Words Document Object Model. Section in MS Word are used if you need to have different page setup or headers/footers within the document.
It is difficult to say without your document, but I suppose there is only one section in your document. It looks like you need to extract content form the document based on the styles.
Also, I have create a simple code example that demonstrates the basic technique:
import aspose.words as aw
# Geneare document with heading paragraphs (just for demonstrations purposes).
doc = aw.Document()
builder = aw.DocumentBuilder(doc)
for i in range(0, 5):
builder.paragraph_format.style_identifier = aw.StyleIdentifier.HEADING1
builder.writeln("Project {0}".format(i))
builder.paragraph_format.style_identifier = aw.StyleIdentifier.NORMAL
for j in range(0, 10):
builder.writeln("This is the project {0} content {1}, each project section will be extracted into a separate document.".format(i, j))
doc.save("C:\\Temp\\out.docx")
# Now split the document by heading paragraphs.
# Code is just for demonstration purposes and supposes there is only one section in the document.
subDocumentIndex = 0
subDocument = doc.clone(False).as_document()
for child in doc.first_section.body.child_nodes:
if child.node_type == aw.NodeType.PARAGRAPH and child.as_paragraph().paragraph_format.style_identifier == aw.StyleIdentifier.HEADING1:
if not subDocument.has_child_nodes:
subDocument.ensure_minimum()
else:
# save subdocument
subDocument.save("C:\\Temp\\sub_document_{0}.docx".format(subDocumentIndex))
subDocumentIndex = subDocumentIndex+1
subDocument = doc.clone(False).as_document()
subDocument.ensure_minimum()
# Remove body content.
subDocument.first_section.body.remove_all_children()
# import current node to the subdocument.
dst_child = subDocument.import_node(child, True, aw.ImportFormatMode.USE_DESTINATION_STYLES)
subDocument.first_section.body.append_child(dst_child)
# save the last document.
subDocument.save("C:\\Temp\\sub_document_{0}.docx".format(subDocumentIndex))

Python docx not finding all sections

I need this code to be able to work with existing documents I have, but it is only finding one footer in each document. I need it to find all 3 (first page, even, and odd).
Here is the link to my test document: http://www.filedropper.com/testdoc_1
from docx import Document
document = Document('C:/Users/username/Desktop/SPECS/TEST DOC.docx')
print(len(document.sections))
for section in document.sections:
footer = section.footer
footer.paragraphs[0].text = footer.paragraphs[0].text.replace("ISSUED FOR CONSTRUCTION", "DESIGN DEVELOPMENT")
document.save('C:/Users/username/Desktop/SPECS/TEST DOC.docx')
So I answered my question by reading this documentation: https://python-docx.readthedocs.io/en/latest/api/section.html
The default is the odd page footer. To access the other footers (if you enabled different first page and even footers) you have to use the below code:
first_page_footer = section.first_page_footer
even_page_footer = section.even_page_footer
odd_page_footer = section.footer

Extracting .docx data, images and structure

Good day SO,
I have a task where I need to extract specific parts of a document template (For automation purposes). While I am able to traverse, and know the current position, of the document during traversal (via checking for Regex, keywords, etc.), I am unable to extract:
The structure of the document
Detect Images that are in-between text
Am I able to obtain, for example, an array of the structure of the document below?
['Paragraph1','Paragraph2','Image1','Image2','Paragraph3','Paragraph4','Image3','Image4']
My current implementation is shown below:
from docx import Document
document = docx.Document('demo.docx')
text = []
for x in document.paragraphs:
if x.text != '':
text.append(x.text)
Using the code above, I am able to obtain all the Text data from the document, but I am unable to detect the type of text (Header or Normal), and I am unable to detect any Images. I am currently using python-docx.
My main problem is to obtain the position of the image within the document (i.e. between paragraphs) so that I can re-create another document, using text and images extracted. This task requires me to know where the image appears in the document, and where to insert the image in the new document.
Any help is greatly appreciated, thank you :)
For extracting the structure of the paragraph and heading you can use the built-in objects in python-docx. Check this code.
from docx import Document
document = docx.Document('demo.docx')
text = []
style = []
for x in document.paragraphs:
if x.text != '':
style.append(x.style.name)
text.append(x.text)
with x.style.name you can get the styling of text in your document.
You can't get the information regarding images in python-docx. For that, you need to parse the xml. Check XML ouput by
for elem in document.element.getiterator():
print(elem.tag)
Let me know if you need anything else.
For extracting image name and its location use this.
tags = []
text = []
for t in doc.element.getiterator():
if t.tag in ['{http://schemas.openxmlformats.org/wordprocessingml/2006/main}r', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t','{http://schemas.openxmlformats.org/drawingml/2006/picture}cNvPr','{http://schemas.openxmlformats.org/wordprocessingml/2006/main}drawing']:
if t.tag == '{http://schemas.openxmlformats.org/drawingml/2006/picture}cNvPr':
print('Picture Found: ',t.attrib['name'])
tags.append('Picture')
text.append(t.attrib['name'])
elif t.text:
tags.append('text')
text.append(t.text)
You can check previous and next text from text list and their tag from the tag list.

How to wrap cell text in tables via docx library or xml?

I have been using python docx library and oxml to automate some changes to my tables in my word document. Unfortunately, no matter what I do, I cannot wrap the text in the table cells.
I managed to successfully manipulate 'autofit' and 'fit-text' properties of my table, but non of them contribute to the wrapping of the text in the cells. I can see that there is a "w:noWrap" in the xml version of my word document and no matter what I do I cannot manipulate and remove it. I believe it is responsible for the word wrapping in my table.
for example in this case I am adding a table. I can fit text in cell and set autofit to 'true' but cannot for life of me wrap the text:
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
doc = Document()
table = doc.add_table(5,5)
table.autofit = True # Does Autofit but not wrapping
tc = table.cell(0,0)._tc # As a test, fit text to cell 0,0
tcPr = tc.get_or_add_tcPr()
tcFitText = OxmlElement('w:tcFitText')
tcFitText.set(qn('w:val'),"true")
tcPr.append(tcFitText) #Does fitting but no wrapping
doc.save('demo.docx')
I would appreciate any help or hints.
The <w:noWrap> element appears to be a child of <w:tcPr>, the element that controls table cell properties.
You should be able to access it from the table cell element using XPath:
tc = table.cell(0, 0)._tc
noWraps = tc.xpath(".//w:noWrap")
The noWraps variable here will then be a list containing zero or more <w:noWrap> elements, in your case probably one.
Deleting it is probably the simplest approach, which you can accomplish like this:
if noWraps: # ---skip following code if list is empty---
noWrap = noWraps[0]
noWrap.getparent().remove(noWrap)
You can also take the approach of setting the value of the w:val attribute of the w:noWrap element, but then you have to get into specifying the Clark name of the attribute namespace, which adds some extra fuss and doesn't really produce a different outcome unless for some reason you want to keep that element around.

Adding hyperlinks in Microsoft word using python

I've been searching for the last few hours and cannot find a library that allows me to add hyperlinks to a word document using python. In my ideal world I'd be able to manipulate a word doc using python to add hyperlinks to footnotes which link to internal documents. Python-docx doesn't seem to have this feature.
It breaks down into 2 questions. 1) Is there a way to add hyperlinks to word docs using python? 2) Is there a way to manipulate footnotes in word docs using python?
Does anyone know how to do this or any part of this?
Hyperlinks can be added using the win32com package:
import win32com.client
#connect to Word (start it if it isn't already running)
wordapp = win32com.client.Dispatch("Word.Application")
#add a new document
doc = wordapp.Documents.Add()
#add some text and turn it into a hyperlink
para = doc.Paragraphs.Add()
para.Range.Text = "Adding hyperlinks in Microsoft word using python"
doc.Hyperlinks.Add(Anchor=para.Range, Address="http://stackoverflow.com/questions/34636391/adding-hyperlinks-in-microsoft-word-using-python")
#In theory you should be able to also pass in a TextToDisplay argument to the above call but I haven't been able to get this to work
#The workaround is to insert the link text into the document first and then convert it into a hyperlink
# How to insert hyperlinks into an existing MS Word document using win32com:
# Use the same call as in the example above to connect to Word:
wordapp = win32com.client.Dispatch("Word.Application")
# Open the input file where you want to insert the hyperlinks:
wordapp.Documents.Open("my_input_file.docx")
# Select the currently active document
doc = wordapp.ActiveDocument
# For my application, I want to replace references to identifiers in another
# document with the general format of "MSS-XXXX", where X is any digit, with
# hyperlinks to local html pages that capture the supporting details...
# First capture the entire document's content as text
docText = doc.Content.text
# Search for all identifiers that match the format criteria in the document:
mss_ids_to_link = re.findall('MSS-[0-9]+', docText)
# Now loop over all the identifier strings that were found, construct the link
# address for each html page to be linked, select the desired text where I want
# to insert the hyperlink, and then apply the link to the correct range of
# characters:
for linkIndex in range(len(mss_ids_to_link)):
current_string_to_link = mss_ids_to_link[linkIndex]
link_address = html_file_pathname + \
current_string_to_link + '.htm'
if wordapp.Selection.Find.Execute(FindText=current_string_to_link, \
Address=link_address) == True:
doc.Hyperlinks.Add(Anchor=wordapp.Selection.Range, \
Address=link_address)
# Save off the result:
doc.SaveAs('my_input_file.docx')

Categories