I'd like to generate documentation via reST, but don't want to write the reST source manually, but let a python script do that and then produce other formats (HTML, PDF) with sphinx.
Imagine I have a telephone book in binary format. Now I use a python script to parse this and generate a document with all the names and numbers:
phone_book = PhonebookParser("somefile.bin")
restdoc = restProducer.NewDocument()
for entry in phone_book:
restdoc.add_section( title = entry.name, body = entry.number )
restdoc.write_to_file("phonebook.rst")
Then I would go on to invoke sphinx for generating pdf and html:
> sphinx phonebook.rst -o phonebook.pdf
> sphinx phonebook.rst -o phonebook.html
Is there a python module (aka restProducer in the example above) that offers an API for generating reST? Or is the best way to just dump reST markup via a couple of print statements?
See Automatically Generating Documentation for All Python Package Contents.
The upcoming Sphinx 1.1 release includes a sphinx-apidoc.py script.
EDIT:
Now that you have explained the problem a bit more, I'd say: go for the "dump reST markup via a couple of print statements" option. You seem to be thinking along those lines already. Why not try to implement a minimalistic restProducer?
If you want docs-without-writing-docs (which will at best give you an API reference rather than real docs), then the autosummary and autodoc extensions for Sphinx may be what you're after.
If your purpose is to programmatically compose the document once, and be able to output in multiple formats, you could have a look at QTextDocument in PyQt Framework. It is an overkill, though.
from PyQt4.QtGui import *
import sys
doc = QTextDocument()
cur = QTextCursor(doc)
d_font = QFont('Times New Roman')
doc.setDefaultFont(d_font)
table_fmt = QTextTableFormat()
table_fmt.setColumnWidthConstraints([
QTextLength(QTextLength.PercentageLength, 30),
QTextLength(QTextLength.PercentageLength, 70)
])
table = cur.insertTable(5,2, table_fmt)
cur.insertText('sample text 1')
cur.movePosition(cur.NextCell)
cur.insertText('sample text 2')
# Print to a pdf file
# QPrinter: Must construct a QApplication before a QPaintDevice
app = QApplication(sys.argv)
printer = QPrinter(QPrinter.HighResolution)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName('sample.pdf')
# Save to file
writer = QTextDocumentWriter()
writer.setFormat(writer.supportedDocumentFormats()[1])
writer.setFileName('sample.odt')
writer.write(doc)
QTextDocumentWriter supports plaintext, html and ODF. QPrinter can be used to print to a physical printer or to a PDF file.
However, templating engines like Jinja2 as you mentioned is a neater way to do it.
Related
I would like to relink a Photoshop Smart Object to a new file using Python.
Here's a screenshot of the button that's used in Photoshop to perform this action - "Relink to File":
I've found some solutions in other programming languages but couldn't make them work in Python, here's one for example: Photoshop Scripting: Relink Smart Object
Editing Contents of a Smart Object would also be a good option, but I can't seem to figure that one out either.
Here's a screenshot of the button to Edit Contents of a Smart Object:
So far I have this:
import win32com.client
psApp = win32com.client.Dispatch('Photoshop.Application')
psDoc = psApp.Application.ActiveDocument
for layer in psDoc.layers:
if layer.kind == 17: # layer kind 17 is Smart Object
print(layer.name)
# here it should either "Relink to File" or "Edit Contents" of a Smart Object
I have figured out a workaround! I simply ran JavaScript in Python.
This is the code to Relink to File.... You could do a similar thing for Edit Contents but I haven't tried it yet, as relinking works better for me.
Keep in mind the new_img_path must be a raw string as far as I'm aware, for example:
new_img_path = r"C:\\Users\\miha\\someEpicPic.jpg"
import photoshop.api as ps
def js_relink(new_img_path):
jscode = r"""
var desc = new ActionDescriptor();
desc.putPath(stringIDToTypeID('null'), new File("{}"));
executeAction(stringIDToTypeID('placedLayerRelinkToFile'), desc, DialogModes.NO);
""".format(new_img_path)
JavaScript(jscode)
def JavaScript(js_code):
app = ps.Application()
app.doJavaScript(js_code)
I'm having trouble trying to create Table of Contents objects in a PDF file. I'm not sure whether I've understood the process from Apple's limited documentation.
I'm using python, but cogent examples in any language are welcome to explain how it's supposed to work. The code creates a new PDF document, but there's no outline item visible in Preview. I've tried just using myOutline as the root object, but that doesn't work either.
pdfURL = NSURL.fileURLWithPath_(infile)
myPDF = Quartz.PDFDocument.alloc().initWithURL_(pdfURL)
if myPDF:
# Create Destination
myPage = myPDF.pageAtIndex_(1)
pagePoint = Quartz.CGPointMake(0,0)
myDestination = Quartz.PDFDestination.alloc().initWithPage_atPoint_(myPage, pagePoint)
# Create Outline
myOutline = Quartz.PDFOutline.alloc().init()
myOutline.setLabel_("Interesting")
myOutline.setDestination_(myDestination)
# Create a root Outline and add the first outline as a child
rootOutline = Quartz.PDFOutline.alloc().init()
rootOutline.insertChild_atIndex_(myOutline, 0)
# Add the root outline to the document and save
myPDF.setOutlineRoot_(rootOutline)
myPDF.writeToFile_(outfile)
EDIT: Actually, the outline IS getting saved to the new file: I can read it programmatically, and it appears in Acrobat as a Bookmark; however, it doesn't show up in Preview's Table of Contents (yes, I checked for the "Hide" thing). If I add another Bookmark in Acrobat, then both show up in Preview.
So I guess that either I'm still doing something wrong which doesn't quite 'finish' the PDFOutline data properly, and Acrobat is being kind; or there's a massive bug in PDFKit that means you can't write PDFOutlines properly. I get the same behaviour on Mountain Lion, FWIW.
This does appear to be a bug in Preview. It will not list the Table of Contents if it contains ONLY ONE child entry.
If I add more Outlines with the code above, then all of them appear in Preview. If use other software to remove all but one entries in the Table of Contents, then Preview will not show any.
I need to generate a customized PDF copy of a template document.
The easiest way - I thought - was to create a source PDF that has some placeholder text where customization needs to happen , ie <first_name> and <last_name>, and then replace these with the correct values.
I've searched high and low, but is there really no way of basically taking the source template PDF, replace the placeholders with actual values and write to a new PDF?
I looked at PyPDF2 and ReportLab but neither seem to be able to do so.
Any suggestions? Most of my searches lead to using a Perl app, CAM::PDF, but I'd prefer to keep it all in Python.
There is no direct way to do this that will work reliably. PDFs are not like HTML: they specify the positioning of text character-by-character. They may not even include the whole font used to render the text, just the characters needed to render the specific text in the document. No library I've found will do nice things like re-wrap paragraphs after updating the text. PDFs are for the most part a display-only format, so you'll be much better off using a tool that turns markup into a PDF than updating the PDF in-place.
If that's not an option, you can create a PDF form in something like Acrobat, then use a PDF manipulation library like iText (AGPL) or pdfbox, which has a nice clojure wrapper called pdfboxing that can handle some of that.
From my experience, Python's support for writing to PDFs is pretty limited. Java has, by far, the best language support. Also, you get what you pay for, so it would probably be worth paying for a iText license if you're using this for commercial purposes. I've had pretty good results writing python wrappers around PDF-manipulation CLI tools like pdfboxing and ghostscript. That will probably be much easier for your use case than trying to shoehorn this into Python's PDF ecosystem.
There is no definite solution but I found 2 solutions that works most of the time.
In python https://github.com/JoshData/pdf-redactor gives good results. Here is the example code:
# Redact things that look like social security numbers, replacing the
# text with X's.
options.content_filters = [
# First convert all dash-like characters to dashes.
(
re.compile(u"Tom Xavier"),
lambda m : "XXXXXXX"
),
# Then do an actual SSL regex.
# See https://github.com/opendata/SSN-Redaction for why this regex is complicated.
(
re.compile(r"(?<!\d)(?!666|000|9\d{2})([OoIli0-9]{3})([\s-]?)(?!00)([OoIli0-9]{2})\2(?!0{4})([OoIli0-9]{4})(?!\d)"),
lambda m : "XXX-XX-XXXX"
),
]
# Perform the redaction using PDF on standard input and writing to standard output.
pdf_redactor.redactor(options)
Full Example can be found here
In ruby https://github.com/gettalong/hexapdf works for black out text.
Example code:
require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page, to_hide_arr)
super()
#canvas = page.canvas(type: :overlay)
#to_hide_arr = to_hide_arr
end
def show_text(str)
boxes = decode_text_with_positioning(str)
return if boxes.string.empty?
if #to_hide_arr.include? boxes.string
#canvas.stroke_color(0, 0 , 0)
boxes.each do |box|
x, y = *box.lower_left
tx, ty = *box.upper_right
#canvas.rectangle(x, y, tx - x, ty - y).fill
end
end
end
alias :show_text_with_positioning :show_text
end
file_name = ARGV[0]
strings_to_black = ARGV[1].split("|")
doc = HexaPDF::Document.open(file_name)
puts "Blacken strings [#{strings_to_black}], inside [#{file_name}]."
doc.pages.each.with_index do |page, index|
processor = ShowTextProcessor.new(page, strings_to_black)
page.process_contents(processor)
end
new_file_name = "#{file_name.split('.').first}_updated.pdf"
doc.write(new_file_name, optimize: true)
puts "Writing updated file [#{new_file_name}]."
In this you can black out text on select text will be visible.
As another solution you may try Aspose.PDF Cloud SDK for Python, it provides the feature to replace text in a PDF document.
First thing first, install the Aspose.PDF Cloud SDK for Python
pip install asposepdfcloud
Sample Code upload PDF file to your cloud storage and replace multiple strings in a PDF document
import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
# Get App key and App SID from https://aspose.cloud
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
filename = '02_pages.pdf'
remote_name = '02_pages.pdf'
#upload PDF file to storage
pdf_api.upload_file(remote_name,filename)
#Replace Text
text_replace1 = asposepdfcloud.models.TextReplace(old_value='origami',new_value='aspose',regex='true')
text_replace2 = asposepdfcloud.models.TextReplace(old_value='candy',new_value='biscuit',regex='true')
text_replace_list = asposepdfcloud.models.TextReplaceListRequest(text_replaces=[text_replace1,text_replace2])
response = pdf_api.post_document_text_replace(remote_name, text_replace_list)
print(response)
I'm developer evangelist at aspose.
I've got a Microsoft Word document with an embedded macro. I've managed to load a document using this example Loading a document on OpenOffice using an external Python program
Now I'm trying to get macros code from my document, but can't figure, how to do this. I've stumbled upon interface that probably can be used (http://www.openoffice.org/api/docs/common/ref/com/sun/star/document/XEmbeddedScripts.html) though it's unclear to me how to use it in Python.
So how can I extract macros text from document using Python UNO?
Which version of LO you are using?
Normally, i would do something like
doc = desktop.loadComponentFromURL(url, "_blank", 0, () )
# the Basic Script Library/Libraries
the_basic_libs = doc.BasicLibraries
if the_basic_libs.hasElements():
the_standard = the_basic_libs.getByName("Standard")
the_one = the_standard.getByName("Module1")
print(the_one)
But my version (LO 4.1.3.2) gives me a "no such element exception", though I can see and access the element using MRI (or the GUI).
Maybe a flaw in LO, uno ... or the fact, that we test with a *.doc
I want to programmatically (using Python) split a multi-page tiff into single pages using Adobe Acrobat's exposed COM Objects.
I am writing this in order to answer my own question in order to put a viable answer out there, as I did not find anyone doing this on SO or any other forum.
Please, let me know what you think about my solution and feel free to leave your way of doing this.
Here is one way:
from win32com.client import Dispatch
def acrobat_split(f_path,f_name,f_ext):
# Connect to Adobe Acrobat.
avDoc = Dispatch("AcroExch.AVDoc")
# Open the input file (as a pdf).
src = f_path+'\\'+f_name+f_ext
avDoc.Open(src,src)
pdDoc = avDoc.GetPDDoc()
page_ct = pdDoc.GetNumPages()
# Set dst.
dst = f_path+'\\'+f_name+PAGE_DIV+".tif"
jsObject = pdDoc.getJSObject()
#Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
jsObject.saveAs(dst,"com.adobe.acrobat.tiff")
pdDoc.Close()
del pdDoc