How to get Text coordinates using PyUNO with OpenOffice writer

How to get Text coordinates using PyUNO with OpenOffice writer - python

I have a python script that successfully does a search and replace in an OpenOffice Writer document using PyUNO. I wanna to ask how to get the coordinate of found text?
import string
search = document.createSearchDescriptor()
search.SearchString = unicode('find')
#search.SearchCaseSensitive = True
#search.SearchWords = True
found = document.findFirst(search)
if found:
#log.debug('Found %s' % find)
## any code here help to get the coordinate of found text?
pass

This is some StarBASIC code to find the page number of a search expression in a Writer document:
SUB find_page_number()
oDoc = ThisComponent
oViewCursor = oDoc.getCurrentController().getViewCursor()
oSearchFor = "My Search example"
oSearch = oDoc.createSearchDescriptor()
With oSearch
.SearchRegularExpression = False
.SearchBackwards = False
.setSearchString(oSearchFor)
End With
oFirstFind = oDoc.findFirst(oSearch)
If NOT isNull(oFirstFind) Then
oViewCursor.gotoRange(oFirstFind, False)
MsgBox oViewCursor.getPage()
Else
msgbox "not found: " & oSearchFor
End If
Hope this helps you

Related

Extract DOCX Comments

I'm a teacher. I want a list of all the students who commented on the essay I assigned, and what they said. The Drive API stuff was too challenging for me, but I figured I could download them as a zip and parse the XML.
The comments are tagged in w:comment tags, with w:t for the comment text and . It should be easy, but XML (etree) is killing me.
via the tutorial (and official Python docs):
z = zipfile.ZipFile('test.docx')
x = z.read('word/comments.xml')
tree = etree.XML(x)
Then I do this:
children = tree.getiterator()
for c in children:
print(c.attrib)
Resulting in this:
{}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}author': 'Joe Shmoe', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id': '1', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date': '2017-11-17T16:58:27Z'}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidR': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidDel': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidP': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidRDefault': '00000000', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rsidRPr': '00000000'}
{}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': '0'}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': '0'}
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val': '0'}
And after this I am totally stuck. I've tried element.get() and element.findall() with no luck. Even when I copy/paste the value ('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val'), I get None in return.
Can anyone help?

You got remarkably far considering that OOXML is such a complex format.
Here's some sample Python code showing how to access the comments of a DOCX file via XPath:
from lxml import etree
import zipfile
ooXMLns = {'w':'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
def get_comments(docxFileName):
docxZip = zipfile.ZipFile(docxFileName)
commentsXML = docxZip.read('word/comments.xml')
et = etree.XML(commentsXML)
comments = et.xpath('//w:comment',namespaces=ooXMLns)
for c in comments:
# attributes:
print(c.xpath('#w:author',namespaces=ooXMLns))
print(c.xpath('#w:date',namespaces=ooXMLns))
# string value of the comment:
print(c.xpath('string(.)',namespaces=ooXMLns))

Thank you #kjhughes for this amazing answer for extracting all the comments from the document file. I was facing same issue like others in this thread to get the text that the comment relates to. I took the code from #kjhughes as a base and try to solve this using python-docx. So here is my take at this.
Sample document.
I will extract the comment and the paragraph which it was referenced in the document.
from docx import Document
from lxml import etree
import zipfile
ooXMLns = {'w':'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
#Function to extract all the comments of document(Same as accepted answer)
#Returns a dictionary with comment id as key and comment string as value
def get_document_comments(docxFileName):
comments_dict={}
docxZip = zipfile.ZipFile(docxFileName)
commentsXML = docxZip.read('word/comments.xml')
et = etree.XML(commentsXML)
comments = et.xpath('//w:comment',namespaces=ooXMLns)
for c in comments:
comment=c.xpath('string(.)',namespaces=ooXMLns)
comment_id=c.xpath('#w:id',namespaces=ooXMLns)[0]
comments_dict[comment_id]=comment
return comments_dict
#Function to fetch all the comments in a paragraph
def paragraph_comments(paragraph,comments_dict):
comments=[]
for run in paragraph.runs:
comment_reference=run._r.xpath("./w:commentReference")
if comment_reference:
comment_id=comment_reference[0].xpath('#w:id',namespaces=ooXMLns)[0]
comment=comments_dict[comment_id]
comments.append(comment)
return comments
#Function to fetch all comments with their referenced paragraph
#This will return list like this [{'Paragraph text': [comment 1,comment 2]}]
def comments_with_reference_paragraph(docxFileName):
document = Document(docxFileName)
comments_dict=get_document_comments(docxFileName)
comments_with_their_reference_paragraph=[]
for paragraph in document.paragraphs:
if comments_dict:
comments=paragraph_comments(paragraph,comments_dict)
if comments:
comments_with_their_reference_paragraph.append({paragraph.text: comments})
return comments_with_their_reference_paragraph
if __name__=="__main__":
document="test.docx" #filepath for the input document
print(comments_with_reference_paragraph(document))
Output for the sample document look like this
I have done this at a paragraph level. This could be done at a python-docx run level as well.
Hopefully it will be of help.

I used Word Object Model to extract comments with replies from a Word document. Documentation on Comments object can be found here. This documentation uses Visual Basic for Applications (VBA). But I was able to use the functions in Python with slight modifications. Only issue with Word Object Model is that I had to use win32com package from pywin32 which works fine on Windows PC, but I'm not sure if it will work on macOS.
Here's the sample code I used to extract comments with associated replies:
import win32com.client as win32
from win32com.client import constants
word = win32.gencache.EnsureDispatch('Word.Application')
word.Visible = False
filepath = "path\to\file.docx"
def get_comments(filepath):
doc = word.Documents.Open(filepath)
doc.Activate()
activeDoc = word.ActiveDocument
for c in activeDoc.Comments:
if c.Ancestor is None: #checking if this is a top-level comment
print("Comment by: " + c.Author)
print("Comment text: " + c.Range.Text) #text of the comment
print("Regarding: " + c.Scope.Text) #text of the original document where the comment is anchored
if len(c.Replies)> 0: #if the comment has replies
print("Number of replies: " + str(len(c.Replies)))
for r in range(1, len(c.Replies)+1):
print("Reply by: " + c.Replies(r).Author)
print("Reply text: " + c.Replies(r).Range.Text) #text of the reply
doc.Close()

If you want also the text the comments relates to :
def get_document_comments(docxFileName):
comments_dict = {}
comments_of_dict = {}
docx_zip = zipfile.ZipFile(docxFileName)
comments_xml = docx_zip.read('word/comments.xml')
comments_of_xml = docx_zip.read('word/document.xml')
et_comments = etree.XML(comments_xml)
et_comments_of = etree.XML(comments_of_xml)
comments = et_comments.xpath('//w:comment', namespaces=ooXMLns)
comments_of = et_comments_of.xpath('//w:commentRangeStart', namespaces=ooXMLns)
for c in comments:
comment = c.xpath('string(.)', namespaces=ooXMLns)
comment_id = c.xpath('#w:id', namespaces=ooXMLns)[0]
comments_dict[comment_id] = comment
for c in comments_of:
comments_of_id = c.xpath('#w:id', namespaces=ooXMLns)[0]
parts = et_comments_of.xpath(
"//w:r[preceding-sibling::w:commentRangeStart[#w:id=" + comments_of_id + "] and following-sibling::w:commentRangeEnd[#w:id=" + comments_of_id + "]]",
namespaces=ooXMLns)
comment_of = ''
for part in parts:
comment_of += part.xpath('string(.)', namespaces=ooXMLns)
comments_of_dict[comments_of_id] = comment_of
return comments_dict, comments_of_dict

Python - LibreOffice Calc - Find & Replace with Regular Expression

I try to code a Find & Replace method with Python in LibreOffice's Calc to replace all the ".+" with "&" (in a single column - not so important) - unfortunately, even a standard Find & Replace method seems to be impossible (to me). That's what I have up to now:
import uno
def search()
desktop = XSCRIPTCONTEXT.getDesktop()
document = XSCRIPTCONTEXT.getDocument()
ctx = uno.getComponentContext()
sm = ctx.ServiceManager
dispatcher = sm.createInstanceWithContext("com.sun.star.frame.DispatchHelper", ctx)
model = desktop.getCurrentComponent()
doc = model.getCurrentController()
sheet = model.Sheets.getByIndex(0)
replace = sheet.createReplaceDescriptor()
replace.SearchRegularExpression = True
replace.SearchString = ".+$"
replace.ReplaceString ="&"
return None
And what happens: totally nothing! I will be happy and thankful for every hint, sample code and motivating words!

This code changes all non-empty cells in column A to &:
def calc_search_and_replace():
desktop = XSCRIPTCONTEXT.getDesktop()
model = desktop.getCurrentComponent()
sheet = model.Sheets.getByIndex(0)
COLUMN_A = 0
cellRange = sheet.getCellRangeByPosition(COLUMN_A, 0, COLUMN_A, 65536);
replace = cellRange.createReplaceDescriptor()
replace.SearchRegularExpression = True
replace.SearchString = r".+$"
replace.ReplaceString = r"\&"
cellRange.replaceAll(replace)
Notice that the code calls replaceAll to actually do something. Also, from the User Guide:
& will insert the same string found with the Search RegExp.
So the replace string needs to be literal -- \&.

Pyuno indexing issue that I would like an explanation for

The following python libreoffice Uno macro works but only with the try..except statement.
The macro allows you to select text in a writer document and send it to a search engine in your default browser.
The issue, is that if you select a single piece of text,oSelected.getByIndex(0) is populated but if you select multiple pieces of text oSelected.getByIndex(0) is not populated. In this case the data starts at oSelected.getByIndex(1) and oSelected.getByIndex(0) is left blank.
I have no idea why this should be and would love to know if anyone can explain this strange behaviour.
#!/usr/bin/python
import os
import webbrowser
from configobj import ConfigObj
from com.sun.star.awt.MessageBoxButtons import BUTTONS_OK, BUTTONS_OK_CANCEL, BUTTONS_YES_NO, BUTTONS_YES_NO_CANCEL, BUTTONS_RETRY_CANCEL, BUTTONS_ABORT_IGNORE_RETRY
from com.sun.star.awt.MessageBoxButtons import DEFAULT_BUTTON_OK, DEFAULT_BUTTON_CANCEL, DEFAULT_BUTTON_RETRY, DEFAULT_BUTTON_YES, DEFAULT_BUTTON_NO, DEFAULT_BUTTON_IGNORE
from com.sun.star.awt.MessageBoxType import MESSAGEBOX, INFOBOX, WARNINGBOX, ERRORBOX, QUERYBOX
def fs3Browser(*args):
#get the doc from the scripting context which is made available to all scripts
desktop = XSCRIPTCONTEXT.getDesktop()
model = desktop.getCurrentComponent()
doc = XSCRIPTCONTEXT.getDocument()
parentwindow = doc.CurrentController.Frame.ContainerWindow
oSelected = model.getCurrentSelection()
oText = ""
try:
for i in range(0,4,1):
print ("Index No ", str(i))
try:
oSel = oSelected.getByIndex(i)
print (str(i), oSel.getString())
oText += oSel.getString()+" "
except:
break
except AttributeError:
mess = "Do not select text from more than one table cell"
heading = "Processing error"
MessageBox(parentwindow, mess, heading, INFOBOX, BUTTONS_OK)
return
lookup = str(oText)
special_c =str.maketrans("","",'!|##"$~%&/()=?+*][}{-;:,.<>')
lookup = lookup.translate(special_c)
lookup = lookup.strip()
configuration_dir = os.environ["HOME"]+"/fs3"
config_filename = configuration_dir + "/fs3.cfg"
if os.access(config_filename, os.R_OK):
cfg = ConfigObj(config_filename)
#define search engine from the configuration file
try:
searchengine = cfg["control"]["ENGINE"]
except:
searchengine = "https://duckduckgo.com"
if 'duck' in searchengine:
webbrowser.open_new('https://www.duckduckgo.com//?q='+lookup+'&kj=%23FFD700 &k7=%23C9C4FF &ia=meanings')
else:
webbrowser.open_new('https://www.google.com/search?/&q='+lookup)
return None
def MessageBox(ParentWindow, MsgText, MsgTitle, MsgType, MsgButtons):
ctx = XSCRIPTCONTEXT.getComponentContext()
sm = ctx.ServiceManager
si = sm.createInstanceWithContext("com.sun.star.awt.Toolkit", ctx)
mBox = si.createMessageBox(ParentWindow, MsgType, MsgButtons, MsgTitle, MsgText)
mBox.execute()

Your code is missing something. This works without needing an extra try/except clause:
selected_strings = []
try:
for i in range(oSelected.getCount()):
oSel = oSelected.getByIndex(i)
if oSel.getString():
selected_strings.append(oSel.getString())
except AttributeError:
# handle exception...
return
result = " ".join(selected_strings)
To answer your question about the "strange behaviour," it seems pretty straightforward to me. If the 0th element is empty, then there are multiple selections which may need to be handled differently.

Reading specific test steps from Quality Center with python

I am working with Quality Center via OTA COM library. I figured out how to connect to server, but I am lost in OTA documentation on how to work with it. What I need is to create a function which takes a test name as an input and returns number of steps in this test from QC.
For now I am this far in this question.
import win32com
from win32com.client import Dispatch
# import codecs #to store info in additional codacs
import re
import json
import getpass #for password
qcServer = "***"
qcUser = "***"
qcPassword = getpass.getpass('Password: ')
qcDomain = "***"
qcProject = "***"
td = win32com.client.Dispatch("TDApiOle80.TDConnection.1")
#Starting to connect
td.InitConnectionEx(qcServer)
td.Login(qcUser,qcPassword)
td.Connect(qcDomain, qcProject)
if td.Connected == True:
print "Connected to " + qcProject
else:
print "Connection failed"
#Path = "Subject\Regression\C.001_Band_tones"
mg=td.TreeManager
npath="Subject\Regression"
tsFolder = td.TestSetTreeManager.NodeByPath(npath)
print tsFolder
td.Disconnect
td.Logout
print "Disconnected from " + qcProject
Any help on descent python examples or tutorials will be highly appreciated. For now I found this and this, but they doesn't help.

Using the OTA API to get data from Quality Center normally means to get some element by path, create a factory and then use the factory to get search the object. In your case you need the TreeManager to get a folder in the Test Plan, then you need a TestFactory to get the test and finally you need the DesignStepFactory to get the steps. I'm no Python programmer but I hope you can get something out of this:
mg=td.TreeManager
npath="Subject\Test"
tsFolder = mg.NodeByPath(npath)
testFactory = tsFolder.TestFactory
testFilter = testFactory.Filter
testFilter["TS_NAME"] = "Some Test"
testList = testFactory.NewList(testFilter.Text)
test = testList.Item(1) # There should be only 1 item
print test.Name
stepFactory = test.DesignStepFactory
stepList = stepFactory.NewList("")
for step in stepList:
print step.StepName
It takes some time to get used to the QC OTA API documentation but I find it very helpful. Nearly all of my knowledge comes from the examples in the API documentation—for your problem there are examples like "Finding a unique test" or "Get a test object with name and path". Both examples are examples to the Test object. Even if the examples are in VB it should be no big thing to adapt them to Python.

I figured out the solution, if there is a better way to do this you are welcome to post it.
import win32com
from win32com.client import Dispatch
import getpass
def number_of_steps(name):
qcServer = "***"
qcUser = "***"
qcPassword = getpass.getpass('Password: ')
qcDomain = "***"
qcProject = "***"
td = win32com.client.Dispatch("TDApiOle80.TDConnection.1")
#Starting to connect
td.InitConnectionEx(qcServer)
td.Login(qcUser, qcPassword)
td.Connect(qcDomain, qcProject)
if td.Connected is True:
print "Connected to " + qcProject
else:
print "Connection failed"
mg = td.TreeManager # Tree manager
folder = mg.NodeByPath("Subject\Regression")
testList = folder.FindTests(name) # Make a list of tests matching name (partial match is accepted)
if testList is not None:
if len(testList) > 1:
print "There are multiple tests matching this name, please check input parameter\nTests matching"
for test in testList:
print test.name
td.Disconnect
td.Logout
return False
if len(testList) == 1:
print "In test %s there is %d steps" % (testList[0].Name, testList[0].DesStepsNum)
else:
print "There are no test with this test name in Quality Center"
td.Disconnect
td.Logout
return False
td.Disconnect
td.Logout
print "Disconnected from " + qcProject
return testList[0].DesStepsNum # Return number of steps for given test

Find text in gtk.TextView

I have a gtk.Textview. I want to find and select some of the text in this TextView programmatically.
I have this code but it's not working correctly.
search_str = self.text_to_find.get_text()
start_iter = textbuffer.get_start_iter()
match_start = textbuffer.get_start_iter()
match_end = textbuffer.get_end_iter()
found = start_iter.forward_search(search_str,0, None)
if found:
textbuffer.select_range(match_start,match_end)
If the text is found, then it selects all the text in the TextView, but I need it to select only the found text.

start_iter.forward_search returns a tuple of the start and end matches so your found variable has both match_start and match_end in it
this should make it work:
search_str = self.text_to_find.get_text()
start_iter = textbuffer.get_start_iter()
# don't need these lines anymore
#match_start = textbuffer.get_start_iter()
#match_end = textbuffer.get_end_iter()
found = start_iter.forward_search(search_str,0, None)
if found:
match_start,match_end = found #add this line to get match_start and match_end
textbuffer.select_range(match_start,match_end)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get Text coordinates using PyUNO with OpenOffice writer - python

Related

Extract DOCX Comments

Python - LibreOffice Calc - Find & Replace with Regular Expression

Pyuno indexing issue that I would like an explanation for

Reading specific test steps from Quality Center with python

Find text in gtk.TextView

Categories

Resources