Extract macro from Office document using pyUNO

Extract macro from Office document using pyUNO - python

I've got a Microsoft Word document with an embedded macro. I've managed to load a document using this example Loading a document on OpenOffice using an external Python program
Now I'm trying to get macros code from my document, but can't figure, how to do this. I've stumbled upon interface that probably can be used (http://www.openoffice.org/api/docs/common/ref/com/sun/star/document/XEmbeddedScripts.html) though it's unclear to me how to use it in Python.
So how can I extract macros text from document using Python UNO?

Which version of LO you are using?
Normally, i would do something like
doc = desktop.loadComponentFromURL(url, "_blank", 0, () )
# the Basic Script Library/Libraries
the_basic_libs = doc.BasicLibraries
if the_basic_libs.hasElements():
the_standard = the_basic_libs.getByName("Standard")
the_one = the_standard.getByName("Module1")
print(the_one)
But my version (LO 4.1.3.2) gives me a "no such element exception", though I can see and access the element using MRI (or the GUI).
Maybe a flaw in LO, uno ... or the fact, that we test with a *.doc

Related

Use "Relink to File" button in Photoshop using Python

I would like to relink a Photoshop Smart Object to a new file using Python.
Here's a screenshot of the button that's used in Photoshop to perform this action - "Relink to File":
I've found some solutions in other programming languages but couldn't make them work in Python, here's one for example: Photoshop Scripting: Relink Smart Object
Editing Contents of a Smart Object would also be a good option, but I can't seem to figure that one out either.
Here's a screenshot of the button to Edit Contents of a Smart Object:
So far I have this:
import win32com.client
psApp = win32com.client.Dispatch('Photoshop.Application')
psDoc = psApp.Application.ActiveDocument
for layer in psDoc.layers:
if layer.kind == 17: # layer kind 17 is Smart Object
print(layer.name)
# here it should either "Relink to File" or "Edit Contents" of a Smart Object

I have figured out a workaround! I simply ran JavaScript in Python.
This is the code to Relink to File.... You could do a similar thing for Edit Contents but I haven't tried it yet, as relinking works better for me.
Keep in mind the new_img_path must be a raw string as far as I'm aware, for example:
new_img_path = r"C:\\Users\\miha\\someEpicPic.jpg"
import photoshop.api as ps
def js_relink(new_img_path):
jscode = r"""
var desc = new ActionDescriptor();
desc.putPath(stringIDToTypeID('null'), new File("{}"));
executeAction(stringIDToTypeID('placedLayerRelinkToFile'), desc, DialogModes.NO);
""".format(new_img_path)
JavaScript(jscode)
def JavaScript(js_code):
app = ps.Application()
app.doJavaScript(js_code)

Use imshow with Matlab Python engine

After building and installing the Python engine shipped with Matlab 2019b in Anaconda
(TestEnvironment) PS C:\Program Files\MATLAB\R2019b\extern\engines\python> C:\Users\USER\Anaconda3\envs\TestEnvironment\python.exe .\setup.py build -b C:\Users\USER\MATLAB\build_temp install
for Python 3.7 I wrote a simple script to test a couple of features I'm interested in:
import matlab.engine as ml_e
# Start Matlab engine
eng = ml_e.start_matlab()
# Load MAT file into engine. The result is a dictionary
mat_file = "samples/lena.mat"
lenaMat = eng.load("samples/lena.mat")
print("Variables found in \"" + mat_file + "\"")
for key in lenaMat.keys():
print(key)
# print(lenaMat["lena512"])
# Use the variable from the MAT file to display it as an image
eng.imshow(lenaMat["lena512"], [])
I have a problem with imshow() (or any similar function that displays a figure in the Matlab GUI on the screen) namely that it shows quickly and then disappears, which - I guess - at least confirms that it is possible to use it. The only possibility to keep it on the screen is to add an infinite loop at the end:
while True:
continue
For obvious reasons this is not a good solution. I am not looking for a conversion of Matlab data to NumPy or similar and displaying it using matplotlib or similar third party libraries (I am aware that SciPy can load MAT files for example). The reason is simple - I would like to use Matlab (including loading whole environments) and for debugging purposes I'd like to be able to show this and that result without having to go through loops and hoops of converting the data manually.

Search and replace placeholder text in PDF with Python

I need to generate a customized PDF copy of a template document.
The easiest way - I thought - was to create a source PDF that has some placeholder text where customization needs to happen , ie <first_name> and <last_name>, and then replace these with the correct values.
I've searched high and low, but is there really no way of basically taking the source template PDF, replace the placeholders with actual values and write to a new PDF?
I looked at PyPDF2 and ReportLab but neither seem to be able to do so.
Any suggestions? Most of my searches lead to using a Perl app, CAM::PDF, but I'd prefer to keep it all in Python.

There is no direct way to do this that will work reliably. PDFs are not like HTML: they specify the positioning of text character-by-character. They may not even include the whole font used to render the text, just the characters needed to render the specific text in the document. No library I've found will do nice things like re-wrap paragraphs after updating the text. PDFs are for the most part a display-only format, so you'll be much better off using a tool that turns markup into a PDF than updating the PDF in-place.
If that's not an option, you can create a PDF form in something like Acrobat, then use a PDF manipulation library like iText (AGPL) or pdfbox, which has a nice clojure wrapper called pdfboxing that can handle some of that.
From my experience, Python's support for writing to PDFs is pretty limited. Java has, by far, the best language support. Also, you get what you pay for, so it would probably be worth paying for a iText license if you're using this for commercial purposes. I've had pretty good results writing python wrappers around PDF-manipulation CLI tools like pdfboxing and ghostscript. That will probably be much easier for your use case than trying to shoehorn this into Python's PDF ecosystem.

There is no definite solution but I found 2 solutions that works most of the time.
In python https://github.com/JoshData/pdf-redactor gives good results. Here is the example code:
# Redact things that look like social security numbers, replacing the
# text with X's.
options.content_filters = [
# First convert all dash-like characters to dashes.
(
re.compile(u"Tom Xavier"),
lambda m : "XXXXXXX"
),
# Then do an actual SSL regex.
# See https://github.com/opendata/SSN-Redaction for why this regex is complicated.
(
re.compile(r"(?<!\d)(?!666|000|9\d{2})([OoIli0-9]{3})([\s-]?)(?!00)([OoIli0-9]{2})\2(?!0{4})([OoIli0-9]{4})(?!\d)"),
lambda m : "XXX-XX-XXXX"
),
]
# Perform the redaction using PDF on standard input and writing to standard output.
pdf_redactor.redactor(options)
Full Example can be found here
In ruby https://github.com/gettalong/hexapdf works for black out text.
Example code:
require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page, to_hide_arr)
super()
#canvas = page.canvas(type: :overlay)
#to_hide_arr = to_hide_arr
end
def show_text(str)
boxes = decode_text_with_positioning(str)
return if boxes.string.empty?
if #to_hide_arr.include? boxes.string
#canvas.stroke_color(0, 0 , 0)
boxes.each do |box|
x, y = *box.lower_left
tx, ty = *box.upper_right
#canvas.rectangle(x, y, tx - x, ty - y).fill
end
end
end
alias :show_text_with_positioning :show_text
end
file_name = ARGV[0]
strings_to_black = ARGV[1].split("|")
doc = HexaPDF::Document.open(file_name)
puts "Blacken strings [#{strings_to_black}], inside [#{file_name}]."
doc.pages.each.with_index do |page, index|
processor = ShowTextProcessor.new(page, strings_to_black)
page.process_contents(processor)
end
new_file_name = "#{file_name.split('.').first}_updated.pdf"
doc.write(new_file_name, optimize: true)
puts "Writing updated file [#{new_file_name}]."
In this you can black out text on select text will be visible.

As another solution you may try Aspose.PDF Cloud SDK for Python, it provides the feature to replace text in a PDF document.
First thing first, install the Aspose.PDF Cloud SDK for Python
pip install asposepdfcloud
Sample Code upload PDF file to your cloud storage and replace multiple strings in a PDF document
import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
# Get App key and App SID from https://aspose.cloud
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
filename = '02_pages.pdf'
remote_name = '02_pages.pdf'
#upload PDF file to storage
pdf_api.upload_file(remote_name,filename)
#Replace Text
text_replace1 = asposepdfcloud.models.TextReplace(old_value='origami',new_value='aspose',regex='true')
text_replace2 = asposepdfcloud.models.TextReplace(old_value='candy',new_value='biscuit',regex='true')
text_replace_list = asposepdfcloud.models.TextReplaceListRequest(text_replaces=[text_replace1,text_replace2])
response = pdf_api.post_document_text_replace(remote_name, text_replace_list)
print(response)
I'm developer evangelist at aspose.

Read binary data off Windows clipboard, in Blender (python)

EDIT: Figured THIS part out, but see 2nd post below for another question.
(a little backstory here, skip ahead for the TLDR :) )
I'm currently trying to write a few scripts for Blender to help improve the level creation workflow for a game that I play (Natural Selection 2). Currently, to move geometry from the level editor to Blender, I have to 1) Save a file from the editor as an .obj 2) import obj into blender, and make my changes. Then I 3) export to the game's level format using an exporter script I wrote, and 4) re-open the file in a new instance of the editor. 5) copy the level data from the new instance. 6) paste into the main level file. This is quite a pain to do, and quite clearly discourages even using the tool at all but for major edits. My idea for an improved workflow: 1) Copy data to clipboard in editor 2) Run importer script in Blender to load data. 3) Run exporter script in blender to save data. 4) Paste back into original file. This not only cuts out two whole steps in the tedious process, but also eliminates the need for extra files cluttering up my desktop. Currently though, I haven't found a way to read in clipboard data from the Windows clipboard into Blender... at least not without having to go through some really elaborate installation steps (eg install python 3.1, install pywin32, move x,y,z to the blender directory, uninstall python 3.1... etc...)
TLDR
I need help finding a way to write/read BINARY data to/from the clipboard in Blender. I'm not concerned about cross-platform capability -- the game tools are Windows only.
Ideally -- though obviously beggars can't be choosers here -- the solution would not make it too difficult to install the script for the layman. I'm (hopefully) not the only person who is going to be using this, so I'd like to keep the installation instructions as simple as possible. If there's a solution available in the python standard library, that'd be awesome!
Things I've looked at already/am looking at now
Pyperclip -- plaintext ONLY. I need to be able to read BINARY data off the clipboard.
pywin32 -- Kept getting missing DLL file errors, so I'm sure I'm doing something wrong. Need to take another stab at this, but the steps I had to take were pretty involved (see last sentence above TLDR section :) )
TKinter -- didn't read too far into this one as it seemed to only read plain-text.
ctypes -- actually just discovered this in the process of writing this post. Looks scary as hell, but I'll give it a shot.

Okay I finally got this working. Here's the code for those interested:
from ctypes import *
from binascii import hexlify
kernel32 = windll.kernel32
user32 = windll.user32
user32.OpenClipboard(0)
CF_SPARK = user32.RegisterClipboardFormatW("application/spark editor")
if user32.IsClipboardFormatAvailable(CF_SPARK):
data = user32.GetClipboardData(CF_SPARK)
size = kernel32.GlobalSize(data)
data_locked = kernel32.GlobalLock(data)
text = string_at(data_locked,size)
kernel32.GlobalUnlock(data)
else:
print('No spark data in clipboard!')
user32.CloseClipboard()

Welp... this is a new record for me (posting a question and almost immediately finding an answer).
For those interested, I found this: How do I read text from the (windows) clipboard from python?
It's exactly what I'm after... sort of. I used that code as a jumping-off point.
Instead of CF_TEXT = 1
I used CF_SPARK = user32.RegisterClipboardFormatW("application/spark editor")
Here's where I got that function name from: http://msdn.microsoft.com/en-us/library/windows/desktop/ms649049(v=vs.85).aspx
The 'W' is there because for whatever reason, Blender doesn't see the plain-old "RegisterClipboardFormat" function, you have to use "...FormatW" or "...FormatA". Not sure why that is. If somebody knows, I'd love to hear about it! :)
Anyways, haven't gotten it actually working yet: still need to find a way to break this "data" object up into bytes so I can actually work with it, but that shouldn't be too hard.
Scratch that, it's giving me quite a bit of difficulty.
Here's my code
from ctypes import *
from binascii import hexlify
kernel32 = windll.kernel32
user32 = windll.user32
user32.OpenClipboard(0)
CF_SPARK = user32.RegisterClipboardFormatW("application/spark editor")
if user32.IsClipboardFormatAvailable(CF_SPARK):
data = user32.GetClipboardData(CF_SPARK)
data_locked = kernel32.GlobalLock(data)
print(data_locked)
text = c_char_p(data_locked)
print(text)
print(hexlify(text))
kernel32.GlobalUnlock(data_locked)
else:
print('No spark data in clipboard!')
user32.CloseClipboard()
There aren't any errors, but the output is wrong. The line print(hexlify(text)) yields b'e0cb0c1100000000', when I should be getting something that's 946 bytes long, the first 4 of which should be 01 00 00 00. (Here's the clipboard data, saved out from InsideClipboard as a .bin file: https://www.dropbox.com/s/bf8yhi1h5z5xvzv/testLevel.bin?dl=1 )

How to generate reST/sphinx source from python?

I'd like to generate documentation via reST, but don't want to write the reST source manually, but let a python script do that and then produce other formats (HTML, PDF) with sphinx.
Imagine I have a telephone book in binary format. Now I use a python script to parse this and generate a document with all the names and numbers:
phone_book = PhonebookParser("somefile.bin")
restdoc = restProducer.NewDocument()
for entry in phone_book:
restdoc.add_section( title = entry.name, body = entry.number )
restdoc.write_to_file("phonebook.rst")
Then I would go on to invoke sphinx for generating pdf and html:
> sphinx phonebook.rst -o phonebook.pdf
> sphinx phonebook.rst -o phonebook.html
Is there a python module (aka restProducer in the example above) that offers an API for generating reST? Or is the best way to just dump reST markup via a couple of print statements?

See Automatically Generating Documentation for All Python Package Contents.
The upcoming Sphinx 1.1 release includes a sphinx-apidoc.py script.
EDIT:
Now that you have explained the problem a bit more, I'd say: go for the "dump reST markup via a couple of print statements" option. You seem to be thinking along those lines already. Why not try to implement a minimalistic restProducer?

If you want docs-without-writing-docs (which will at best give you an API reference rather than real docs), then the autosummary and autodoc extensions for Sphinx may be what you're after.

If your purpose is to programmatically compose the document once, and be able to output in multiple formats, you could have a look at QTextDocument in PyQt Framework. It is an overkill, though.
from PyQt4.QtGui import *
import sys
doc = QTextDocument()
cur = QTextCursor(doc)
d_font = QFont('Times New Roman')
doc.setDefaultFont(d_font)
table_fmt = QTextTableFormat()
table_fmt.setColumnWidthConstraints([
QTextLength(QTextLength.PercentageLength, 30),
QTextLength(QTextLength.PercentageLength, 70)
])
table = cur.insertTable(5,2, table_fmt)
cur.insertText('sample text 1')
cur.movePosition(cur.NextCell)
cur.insertText('sample text 2')
# Print to a pdf file
# QPrinter: Must construct a QApplication before a QPaintDevice
app = QApplication(sys.argv)
printer = QPrinter(QPrinter.HighResolution)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName('sample.pdf')
# Save to file
writer = QTextDocumentWriter()
writer.setFormat(writer.supportedDocumentFormats()[1])
writer.setFileName('sample.odt')
writer.write(doc)
QTextDocumentWriter supports plaintext, html and ODF. QPrinter can be used to print to a physical printer or to a PDF file.
However, templating engines like Jinja2 as you mentioned is a neater way to do it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.