I have tried using various declarations for trying to invoke the template file but for some reason the script is failing to pick the template from the location specified and loading the reporting contents to the template and then exporting it to a pdf format.
The code i have attached hereby :
`#Build the html report using the html template and save to the set location
output_from_parsed_template = buildTemplate()
with open(r"C:\python_report_scripts\anram_report.html","wb") as fh:
fh.write(output_from_parsed_template)
#Convert html to pdf
subprocess.call('C:\omniformat\html2pdf995.exe r"C:\python_report_scripts\anram_report.html" r"C:\ANRAM_Requests\Working_folder\\'+sectionName+'.pdf"')
#Run a summary report when there is at least 1 section in the section list
if len(sectionInfo)>1:
#Flag that this is a summary report
summary = 1
#Generate the Existing Map image
MapExisting(summary)
#Generate the Existing Risk Graph image
RiskgraphExisting(summary)
#Generate the New Map Image
MapNew(summary)
#Generate the New Risk Graph image
RiskgraphNew(summary)
#Generate the tabular results for the report
populateResults(summary)
#Indicate that the report is a summary, which then forms the report title
#global sectionName
sectionName = 'SUMMARY - '+sectionName
#Build the html report using the html template and save to the set location
output_from_parsed_template = buildTemplate()
with open(r"C:\python_report_scripts\anram_report.html","wb") as fh:
fh.write(output_from_parsed_template)
#Convert html to pdf
subprocess.call('C:\omniformat\html2pdf995.exe r"C:\python_report_scripts\anram_report.html" r"C:\ANRAM_Requests\Working_folder\\'+sectionName+'.pdf"')`
So am i declaring it correctly ?
Please advise,
This line looks weird:
#Convert html to pdf
subprocess.call('C:\omniformat\html2pdf995.exe r"C:\python_report_scripts\anram_report.html" r"C:\ANRAM_Requests\Working_folder\\'+sectionName+'.pdf"')`
You seem to be mixing up Python quoting, Python raw strings, with shell quoting and you still have some double backslashes around...
I suggest:
Use subprocess.check_call instead, that way Python will report if there are any errors running the command.
Pass it a list, instead of a string, that way it's more clear which argument is which and you don't depend so much on shell quoting and word splitting.
Use raw strings consistently (r'...' or r"...", either one is fine.)
Use os.path.join to join paths!
So, putting it all together, try this instead:
# Convert html to pdf
subprocess.check_call([
r'C:\omniformat\html2pdf995.exe',
r'C:\python_report_scripts\anram_report.html',
os.path.join(r'C:\ANRAM_Requests\Working_folder', sectionName + '.pdf')
])
I hope this solves the issue you're seeing... Or, if it doesn't, at least gives you a more meaningful error message that you can act on.
Related
Context
I have been working for some time on creating a Python Script that uses the docxtpl package (and Jinja2 for managing tags and templates) to automate creation of MS Word reports.
My script (see below) is located in abase directory, along with an excel document for auto-filling tags and a template word document that is referenced. Within the base directory, there is a sub-directory (Image_loop) that contains a further directory for each placeholder image that must be replaced. The images are replaced using the Alt-text that has been assigned to each placeholder image in the template document, and has the same name as the directories within Image_loop (Image1, Image 2, etc). My directory setup can be seen in the photos below.
Directory 1
Directory 2
My Code
import jinja2
import json
import numpy as np
from pathlib import Path
import pandas as pd
from docxtpl import DocxTemplate
import glob
import os, sys
from docxtpl import DocxTemplate, InlineImage # pip install docxtpl
from docx.shared import Cm, Inches, Mm, Emu # pip install python-docx
base_dir = Path('//mnt//c//Users//XXX//Desktop//AUTOMATED_REPORTING') #make sure base directory is your own, the one you are going to be working out of, in Ubuntu directory format
word_template_path = base_dir / "Template1.docx" #set your word document template
excel_path = base_dir / "Book1.xlsx" #set the reference excel document
output_dir = base_dir / "OUTPUT" # set a directory for all outputs
output_dir.mkdir(exist_ok=True) # creates directory if not already existing
df = pd.read_excel(excel_path, sheet_name="Sheet1", dtype=str) #read the excel reference document as a pandas dataframe, datatype as string to avoid formatting issues
df2 = df.fillna(value='', method=None, axis=None, inplace=False, limit=None, downcast=None) #turns N/A values to blanks, as pandas data frame cannot have empty cells, but we want no value to be displayed in some instances
doc = DocxTemplate(word_template_path)
context = {}
image_filepath = Path('//mnt//c//Users//XXX//Desktop//AUTOMATED_REPORTING//Image_loop')
for record in df2.to_dict(orient="records"): #for loop that allows for values from Excel Spreadsheet to be rendered in template document
output_path = output_dir / f"{record['Catchment']}-Test_document.docx"
for address, dirs, files in os.walk(image_filepath): #for loop that iterates through 'image filepath' to find relevant sub-directories and the associated images within, to replace placeholder image in template word document
i = 0
while i < len(dirs):
dir_int = [*dirs[i][-1]]
directory = str(dirs[i])
if os.path.exists(image_filepath / f"{directory}/{record['Catchment']}.png"):
doc.replace_pic(f"{directory}", image_filepath / f"{directory}/{record['Catchment']}.png")
i += 1
doc.render(record)
doc.save(output_path)
Problem (help please)
My problem is that for some of my reports, there are no images for some of the placeholders. So for the sub-directories within Image_loop (Image1, Image 2, etc.), there is no image that corresponds to the template image number for that specific report.
So whilst the sub-directory 'Image_1' may contain for reports A,B,C,D:
Map_A.png (for report A)
Map_B.png (for report B)
Map_C.png (for report C)
Map_D.png (for report D)
i.e a map for every report
The sub-directory 'Image_2' only contains for reports A,B,C,D:
Graph_A (for report A)
Graph_B (for report B)
Graph_D (for report D)
i.e. there is to be no graph for report C
I am able to avoid bullet points or tables from the template document being automatically printed when there is no corresponding value to be filled by the Excel document for a specific report. This is done directly in the template document, using a 'new paragraph if statement' in Jinja 2 (https://jinja.palletsprojects.com/en/3.0.x/templates/). It looks something like this:
{%p if <TEMPLATE_VALUE> != '' %}
{%p endif %}
(i.e. don't print the bullet points, table, etc ,if there is no value to fill them with)
BUT if I wrap this same if statement at the start and end of a template image within the template document, I get an error running the code in Linux Ubuntu: ValueError: Picture ImageXYZ not found in the docx template
The error is attributed to the last line of my code: doc.save(output_path). I assume this is because the Jinja 2 '%p if statement' is removing the placeholder image when there is no replacement image to be found, and this creates a problem when trying to save report documents that are outliers (with no actual image to replace the placeholder image). When the code is run, reports are generated for those that have images for all placeholders, but not the 'outlier' document.
I'm sure there is a way to modify my code to generate the outlier reports, even though the placeholder image is not going to be replaced. Perhaps with a 'try:, except:' statement?
But I'm a bit stuck...
I need to access data from pdf form fields. I tried the package PyPDF2 with this code:
import PyPDF2
reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())
But this gives me only the text of the normal pdf data, not the form fields.
Does anyone know how to read text from the form fields?
You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:
from PyPDF2 import PdfFileReader
infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can't directly read data inside pdf files.
There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.
In maya help there is one specific flag "buildLoadSettings" for file command. It allows to load information about the scene without loading the actual scene into maya.
cmds.file( myFile, o=1, bls=True )
And it nicely prints out all the references. But how can I actually get those references? Anything, a file would be nice.
Because querying for references give me just the references in the scene. And since "buildLoadSettings" does not load any nodes I cannot get any info about anything.
This is from help:
When used with the "o/open" flag it indicates that the specified file should be read for reference hierarchy information only. This information will be stored in temporary load settings under the name "implicitLoadSettings"
But what the hell is "implicitLoadSettings" and how can I get information from it?
implicitLoadSettings is a temp string saved by Maya, which is primarily intended for internal use within the Preload Reference Editor (see the link below).
You can read back your implicitLoadSettings with the selLoadSettings command:
http://download.autodesk.com/us/maya/2010help/CommandsPython/selLoadSettings.html
Basic example:
from maya import cmds
cmds.file('/path/to/file_with_references.mb', o=1, bls=1)
nsettings = range(cmds.selLoadSettings(ns=1, q=1))
# cast id numbers to strings and skip id 0
# (id '0' is the base file containg the references)
ids = [str(i) for i in nsettings if i]
print cmds.selLoadSettings(ids, fn=1, q=1)
My point is that using either pod (from appy framework, which is a pain to use for me) or the OpenOffice UNO bridge that seems soon to be deprecated, and that requires OOo.org to run while launching my script is not satisfactory at all.
Can anyone point me to a neat way to produce a simple yet clean ODT (tables are my priority) without having to code it myself all over again ?
edit: I'm giving a try to ODFpy that seems to do what I need, more on that later.
Your mileage with odfpy may vary. I didn't like it - I ended up using a template ODT, created in OpenOffice, oppening the contents.xml with ziplib and elementtree, and updating that. (In your case, it would create only the relevant table rows and table cell nodes), then recorded everything back.
It is actually straightforward, but for making ElementTree properly work with the XML namespaces. (it is badly documente) But it can be done. I don't have the example, sorry.
To edit odt files, my answer may not help, but if you want to create new odt files, you can use QTextDocument, QTextCursor and QTextDocumentWriter in PyQt4. A simple example to show how to write to an odt file:
>>>from pyqt4 import QtGui
# Create a document object
>>>doc = QtGui.QTextDocument()
# Create a cursor pointing to the beginning of the document
>>>cursor = QtGui.QTextCursor(doc)
# Insert some text
>>>cursor.insertText('Hello world')
# Create a writer to save the document
>>>writer = QtGui.QTextDocumentWriter()
>>>writer.supportedDocumentFormats()
[PyQt4.QtCore.QByteArray(b'HTML'), PyQt4.QtCore.QByteArray(b'ODF'), PyQt4.QtCore.QByteArray(b'plaintext')]
>>>odf_format = writer.supportedDocumentFormats()[1]
>>>writer.setFormat(odf_format)
>>>writer.setFileName('hello_world.odt')
>>>writer.write(doc) # Return True if successful
True
QTextCursor also can insert tables, frames, blocks, images. More information. More information at:
http://qt-project.org/doc/qt-4.8/qtextcursor.html
As a bonus, you also can print to a pdf file by using QPrinter.
I have an HTML table that I'd like to be able to export to an Excel file. I already have an option to export the table into an IQY file, but I'd prefer something that didn't allow the user to refresh the data via Excel. I just want a feature that takes a snapshot of the table at the time the user clicks the link/button.
I'd prefer it if the feature was a link/button on the HTML page that allows the user to save the query results displayed in the table. It would also be nice if the formatting from the HTML/CSS could be retained. Is there a way to do this at all? Or, something I can modify with the IQY?
I can try to provide more details if needed. Thanks in advance.
You can use the excellent xlwt module.
It is very easy to use, and creates files in xls format (Excel 2003).
Here is an (untested!) example of use for a Django view:
from django.http import HttpResponse
import xlwt
def excel_view(request):
normal_style = xlwt.easyxf("""
font:
name Verdana
""")
response = HttpResponse(mimetype='application/ms-excel')
wb = xlwt.Workbook()
ws0 = wb.add_sheet('Worksheet')
ws0.write(0, 0, "something", normal_style)
wb.save(response)
return response
Use CSV. There's a module in Python ("csv") to generate it, and excel can read it natively.
Excel support opening an HTML file containing a table as a spreadsheet (even with CSS formatting).
You basically have to serve that HTML content from a django view, with the content-type application/ms-excel as Roberto said.
Or if you feel adventurous, you could use something like Downloadify to prepare the file to be downloaded on the client side.