python-docx - How do I add an image at a specific position? - python

I have a pre-existing document that I need to add a picture to. The picture needs to be inside a frame in the upper right corner of the document. The frame it needs to go in is already part of the document. All I need to do as far as I understand is add a run to the paragraph inside the frame (which is a v:shape element) and then add a picture to that and the position of the picture should assume the position of the surrounding element. The only issue is I can only get a paragraph, run or picture to get added to the end of the document but not to a specific element farther down the document. Also I whereas I can set a size to the image there is no option to set the position of it on the page. Is there anyway to do this? Again what I'm wanting to do is add a picture to a document at a specific x, y position on the page.
edit:
Here's my latest attempt. I don't know any other way than to drill down deep into the document to get the element I want to add a picture to:
for r in doc.paragraphs[0].runs:
for r2 in r._r.iterchildren():
if r2.tag == '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pict':
for r3 in r2.iterchildren():
if r3.tag == '{urn:schemas-microsoft-com:vml}shape':
for r4 in r3.iterchildren():
if r4.tag == '{urn:schemas-microsoft-com:vml}textbox':
for r5 in r4.iterchildren():
if r5.tag == '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}txbxContent':
for r6 in r5.iterchildren():
if r6.tag == '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p':
#pic = doc.add_picture('image.jpg', width=1508760, height=1905000)
#pic_xml = pic._inline.xml
#r6.append(pic._inline)
pr = r6.add_run()
pr.add_picture('image.jpg', width=1508760, height=1905000)
I've read examples that say to add a run to a paragraph and then add a picture to that run. This is what I'm trying but all I get is an error like this: "'CT_P' object has no attribute 'add_run'"
The fact is though is that you actually can add a run to a paragraph but this is saying that the element I'm drilling down to is a different object type than a regular paragraph object. Does that make sense? Do I have to covert this element somehow?

Related

How to update Span (Bokeh) using ColumnDataSource?

I am trying to update Span using ColumnDataSource, but the information is not being passed onto the source. Span unfortunately does not have a paremeter "source", so is there a better way?
I have defined my sources, figure and line like so:
m1_source = ColumnDataSource(data=dict(x1=[], y1=[]))
m1_spans = ColumnDataSource(data=dict(x1=[]))
p = figure(x_axis_type="datetime", title="", sizing_mode="fixed", height = 500, width = 1400)
p.line(x ="x1", y="y1", color = 'blue', source=m1_source)
Then I have a for loop that should ideally plot multiple spans, each 'i' will be a separate timestamp:
for i in m1_spans.data['x1']:
p.add_layout(Span(location=i, dimension='height', line_color='red', line_dash='solid', line_width=1))
This is taken from my update() function:
m1_source.data = dict(
x1=machine_1_vals['mpTimestamp'],
y1=machine_1_vals['Value'])
m1_spans.data = dict( x1=ToolsDF.loc[ToolsDF['Value'] == float(vals['Tool_val'])]['Timestamp'])
I have checked this, and m1_spans does successfully return multiple timestamps, so the error shouldn't be here.
The reason I am confused, is because my p.line will successfully update with no issues, but it does have a source parameter, while span does not.
I would be really appreciative for any advice about how to solve this issue.
If I should have supplied more information, I apologize and can update as needed, I just tried to keep it brief for you.
Thanks.
Span objects do not currently have an ability to be "powered" by a ColumnDataSource. Each Span only draws one span, specified by its own location property.
You will need to update the location property individually on each Span object in order to update it. Alternatively, if you absolutely want to be able to drive updates through a CDS, you could look at using a multi_line, segment, or ray glyph instead. Those all have different ways to configure their coordinates, so you'd have to see which is most convenient to your use-case. But they all come with one trade-off, which is that none of them have the full "infinite extent" that a Span supports.

Can a text be searched Blockwise in a PDF using PyMuPDF?

page.getTextBlocks()
Output
[(42.5, 86.45002746582031, 523.260009765625, 100.22002410888672, TEXT, 0, 0),
(65.75, 103.4000244140625, 266.780029296875, 159.59010314941406, TEXT, 1, 0),
(48.5, 86.123456, 438.292048492, 100.92920404974, TEXT, 0, 0)]
(x0, y0, x1, y1, "lines in block", block_type, block_no)
My main aim is:
to search for a text in a PDF and highlight it
The text that has to be searched can exist in a page n number of times. using tp.search(text,hit_max=1) it could limit the maximum number of occurence but it won't solve the problem because it will select the first occurence of text but for me may be the second or the third occurence is important.
My Idea is:
getTextBlocks extracts the text as mentioned above, using this information specifically the block_no, i want to perform page.searchForfunction for that particular block. Logically it should be possible, but practically i need help on how to do it.
I would appreciate any inputs on acheiving the main aim.
Thanks
As a preface let me say that your question would benefit the issue page of my repository.
Page.searchFor() searches for any number text items on the page. The restriction is the number of hits, which has a limit you must specify in the call. But you can use any number here (take 100 for example). This method extracts no text, ignores character casing and also supports non-horizontal text or text spread across multiple lines. Its output can be directly used to create text marker annotations and more.
You are of course free to extract text by using variations of Page.getText(option) and then apply your finesse to find what you want in the output. option may be "text", "words", "blocks", "dict", "rawdict", "html", "xhtml", or "xml". Each output has its pros and cons obviously. Many of the variants come with text position information, or font information including text color, etc.
But as said: it is up to you how you locate stuff. Let me suggest again we continue this conversation on the Github repo issue page, where I can better point to other resources. Or feel free to use my private e-mail.
If your question means to (1) locate text occurrences, and then (2) link each occurrence to a text block number, then just make a list of block rectangles and check each occurrence whether it is contained in a block rectangle:
for j, rect in enumerate(page.searchFor(text,...)):
for i, bbox in enumerate(block_rectangles):
if rect in bbox:
print("occurrence %i is contained in block %i" % (j, i))

How can I accurately set the new cursor positions after text replacements have been made

I am trying to adapt a plugin for automated text replacement in a Sublime Text 3 Plugin. What I want it to do is paste in text from the clipboard and make some automatic text substitutions
import sublime
import sublime_plugin
import re
class PasteAndEscapeCommand(sublime_plugin.TextCommand):
def run(self, edit):
# Position of cursor for all selections
before_selections = [sel for sel in self.view.sel()]
# Paste from clipboard
self.view.run_command('paste')
# Postion of cursor for all selections after paste
after_selections = [sel for sel in self.view.sel()]
# Define a new region based on pre and post paste cursor positions
new_selections = list()
delta = 0
for before, after in zip(before_selections, after_selections):
new = sublime.Region(before.begin() + delta, after.end())
delta = after.end() - before.end()
new_selections.append(new)
# Clear any existing selections
self.view.sel().clear()
# Select the saved region
self.view.sel().add_all(new_selections)
# Replace text accordingly
for region in self.view.sel():
# Get the text from the selected region
text = self.view.substr(region)
# Make the required edits on the text
text = text.replace("\\","\\\\")
text = text.replace("_","\\_")
text = text.replace("*","\\*")
# Paste the text back to the saved region
self.view.replace(edit, region, text)
# Clear selections and set cursor position
self.view.sel().clear()
self.view.sel().add_all(after_selections)
This works for the most part except I need to get the new region for the edited text. The cursor will be placed to the location of the end of the pasted text. However since I am making replacements which always make the text larger the final position will be inaccurate.
I know very little about Python for Sublime and like most others this is my first plugin.
How do I set the cursor position to account for the size changes in the text. I know I need to do something with the after_selections list as I am not sure how to create new regions as they were created from selections which are cleared in an earlier step.
I feel that I am getting close with
# Add the updated region to the selection
self.view.sel().subtract(region)
self.view.sel().add(sublime.Region(region.begin()+len(text)))
This, for some yet unknown to me reason, places the cursor at the beginning and end of the replaced text. A guess would be that I am removing the regions one by one but forgetting some "initial" region that also exists.
Note
I am pretty sure the double loop in the code in the question here is redundant. but that is outside the scope of the question.
I think your own answer to your question is a good one and probably the way I would go if I was to do something like this in this manner.
In particular, since the plugin is modifying the text on the fly and making it longer, the first way that immediately presents itself as a solution other than what your own answer is doing would be to track the length change of the text after the replacements so you can adjust the selections accordingly.
Since I can't really provide a better answer to your question than the one you already came up with, here's an alternative solution to this instead:
import sublime
import sublime_plugin
class PasteAndEscapeCommand(sublime_plugin.TextCommand):
def run(self, edit):
org_text = sublime.get_clipboard()
text = org_text.replace("\\","\\\\")
text = text.replace("_","\\_")
text = text.replace("*","\\*")
sublime.set_clipboard(text)
self.view.run_command("paste")
sublime.set_clipboard(org_text)
This modifies the text on the clipboard to be quoted the way you want it to be quoted so that it can just use the built in paste command to perform the paste.
The last part puts the original clipboard text back on the clipboard, which for your purposes may or may not be needed.
So, one approach for this would be to make new regions as the replaced text is created using their respective lengths as starting positions. Then once the loop is complete clear all existing selections and set the new one we created in the replacement loop.
# Replace text accordingly
new_replacedselections = list()
for region in self.view.sel():
# Get the text from the selected region
text = self.view.substr(region)
# Make the required edits on the text
text = text.replace("\\","\\\\") # Double up slashes
text = text.replace("*","\\*") # Escape *
text = text.replace("_","\\_") # Escape _
# Paste the text back to the saved region
self.view.replace(edit, region, text)
# Add the updated region to the collection
new_replacedselections.append(sublime.Region(region.begin()+len(text)))
# Set the selection positions after the new insertions.
self.view.sel().clear()
self.view.sel().add_all(new_replacedselections)

Word & Python - Create Table of Contents

I'm using the pywin32.client extension for python and building a Word document. I have tried a pretty good host of methods to generate a ToC but all have failed.
I think what I want to do is call the ActiveDocument object and create one with something like this example from the MSDN page:
Set myRange = ActiveDocument.Range(Start:=0, End:=0)
ActiveDocument.TablesOfContents.Add Range:=myRange, _
UseFields:=False, UseHeadingStyles:=True, _
LowerHeadingLevel:=3, _
UpperHeadingLevel:=1
Except in Python it would be something like:
wordObject.ActiveDocument.TableOfContents.Add(Range=???,UseFiles=False, UseHeadingStyles=True, LowerHeadingLevel=3, UpperHeadingLevel=1)
I've built everything so far using the 'Selection' object (example below) and wish to add this ToC after the first page break.
Here's a sample of what the document looks like:
objWord = win32com.client.Dispatch("Word.Application")
objDoc = objWord.Documents.Open('pathtotemplate.docx') #
objSel = objWord.Selection
#These seem to work but I don't know why...
objWord.ActiveDocument.Sections(1).Footers(1).PageNumbers.Add(1,True)
objWord.ActiveDocument.Sections(1).Footers(1).PageNumbers.NumberStyle = 57
objSel.Style = objWord.ActiveDocument.Styles("Heading 1")
objSel.TypeText("TITLE PAGE AND STUFF")
objSel.InsertParagraph()
objSel.TypeText("Some data or another"
objSel.TypeParagraph()
objWord.Selection.InsertBreak()
####INSERT TOC HERE####
Any help would be greatly appreciated! In a perfect world I'd use the default first option which is available from the Word GUI but that seems to point to a file and be harder to access (something about templates).
Thanks
Manually, edit your template in Word, add the ToC (which will be empty initially) any intro stuff, header/footers etc., then at where you want your text content inserted (i.e. after the ToC) put a uniquely named bookmark. Then in your code, create a new document based on the template (or open the template then save it to a different name), search for the bookmark and insert your content there. Save to a different filename.
This approach has all sorts of advantages - you can format your template in Word rather than by writing all the code details, and so you can very easily edit your template to update styles when someone says they want the Normal font to be bigger/smaller/pink you can do it just by editing the template. Make sure to use styles in your code and only apply formatting when it is specifically different from the default style.
Not sure how you make sure the ToC is actually generated, might be automatically updated on every save.

Removing Paragraph From Cell In Python-Docx

I am attempting to create a table with a two row header that uses a simple template format for all of the styling. The two row header is required because I have headers that are the same under two primary categories. It appears that the only way to handle this within Word so that a document will format and flow with repeating header across pages is to nest a two row table into the header row of a main content table.
In Python-DocX a table cell is always created with a single empty paragraph element. For my use case I need to be able to remove this empty paragraph element entirely not simply clear it with an empty string. Or else I have line break above my nested table that ruins my illusion of a single table.
So the question is how do you remove the empty paragraph?
If you know of a better way to handle the two row header implementation... that would also be appreciated info.
While Paragraph.delete() is not implemented yet in python-docx, there is a workaround function documented here: https://github.com/python-openxml/python-docx/issues/33#issuecomment-77661907
Note that a table cell must always end with a paragraph. So you'll need to add an empty one after your table otherwise I believe you'll get a so-called "repair-step" error when you try to load the document.
Probably worth a try without the extra paragraph just to confirm; I'm expect it would look better without it, but last time I tried that I got the error.
As #scanny said before, it can delete the current graph if pass the p to self-defined delete function.
I just want to do a supplement, in case if you want to delete multiple paragraphs.
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
paragraph._p = paragraph._element = None
def remove_multiple_para(doc):
i = 0
while i < len(doc.paragraphs):
if 'xxxx' in doc.paragraphs[i].text:
for j in range(i+2, i-2, -1):
# delete the related 4 lines
delete_paragraph(doc.paragraphs[j])
i += 1
doc.save('outputDoc.docx')
doc = docx.Document('./inputDoc.docx')
remove_multiple_para(doc)

Categories