I want to print my summary stats sometimes to console, and also other times to Word.
I don't want my code to be littered with lines calling to Word, because then I'd need to find and comment out like 100 lines each time I just wanted the console output.
I've thought about using a flag variable up at the front and changing it to false when I wanted to print versus not, but that's also a hassle.
The best solution I came up with was to write a separate script that opens a document, writes by calling my first summary stats script, and then closes the document:
import sys
import RunSummaryStats
from docx import Document
filename = "demo.docx"
document = Document()
document.save(filename)
f = open(filename, 'w')
sys.stdout = f
# actually call my summary stats script here: Call RunSummaryStats etc.
print("5")
f.close()
However, when I tried doing the above with python docx, upon opening my docs file I received the error We're sorry, we can't open this document because some parts are missing or invalid. As you can see the code above just printed out one number so it can't be a problem with the data I'm trying to write.
Finally, it needs to go to Word and not other file formats, to format some data tables.
By the way, this is an excerpt of RunSummaryStats. You can see how it's already filled with print lines which are helpful when I'm still exploring the data, and which I don't want to get rid of/replace with adding into a list:
The easy thing is to let cStringIO do the work, and separate collecting all your data from writing it into a file. That is:
import RunSummaryStats
import sys
# first, collect all your data into a StringIO object
orig_stdout = sys.stdout
stat_buffer = cStringIO.StringIO()
sys.stdout = stat_buffer
try:
# actually call my summary stats script here: Call RunSummaryStats etc.
print("5")
finally:
sys.stdout = orig_stdout
# then, write the StringIO object's contents to a Document.
from docx import Document
filename = "demo.docx"
document = Document()
document.write(add_paragraph(stat_buffer.getvalue()))
document.save(filename)
The Document() constructor essentially creates the .docx file package (this is actually a .zip archive of lots of XML and other stuff, which later the Word Application parses and renders etc.).
This statement f = open(filename, 'w') opens that file object (NB: this does not open Word Application, nor does it open a Word Document instance) and then you dump your stdout into that object. That is 100% of the time going to result in a corrupted Word Document; because you simply cannot write to a word document that way. You're basically creating a plain text file with a docx extension, but none of the underlying "guts" that make a docx a docx. As a result, Word Application doesn't know what to do with it.
Modify your code so that this "summary" procedure returns an iterable (the items in this iterable will be whatever you want to put in the Word Document). Then you can use something like the add_paragraph method to add each item to the Word Document.
def get_summary_stats(console=False):
"""
if console==True, results will be printed to console
returns a list of string for use elsewhere
"""
# hardcoded, but presume you will actually *get* these information somehow, modify as needed:
stats = ["some statistics about something", "more stuff about things"]
if console:
for s in stats:
print(s)
return stats
Then:
filename = "demo.docx"
document = Document()
# True will cause this function to print to console
for stat in get_summary_stats(True):
document.add_paragraph(stat)
document.save(filename)
So maybe there was a better way to do it, but in the end I
created a single function out of my summary stats script def run_summary
created a function based on #Charles Duffy's answer def print_word where StringIO reads from RunSummaryStats.run_summary(filepath, filename)
called def_print_word in my final module. There I set the variables for path, filename, and raw data source like so:
PrintScriptToWord.print_word(ATSpath, RSBSfilename, curr_file + ".docx")
I welcome any suggestions to improve this or other approaches.
Related
Background
I'm writing a html book about doing things in python. It contains a ton of text and code interspersed with output. I want to be able to modify my python code at any time, and have one script to batch update all the HTML files with the changes.
Since I'll have tons of html files and tons of python snippets, I can't copy and paste from python into HTML manually every time I change something. that will be a nightmare.
Edit:
So I have two complimentary files: a python file that contains the logic, and an HTML file that is the tutorial for that python code. I'd like to be able to edit my python files at will, and tag them so that my HTML file can be updated too.
At the moment, when I update my python code, I just run a separate python script that looks in both files for matching tags, and copies the python code between its tags over to the HTML between its matching tags. I have many such tags throughout the files. I also have many such pairs of files.
the tags are ^tagname and end with ^/tagname
However, this only works for the code itself, not the code's output. I'd like to also be able to (if desired) copy the output from the tagged python over to to the html as well, and have it appear inside a slightly modified tag.
I was thinking for output &tagname and &/tagname.
End Edit
But getting the output from the python files to do the same thing is proving to be quite difficult.
My python code is: (testCode.py)
#%%
# ^test
x=1
print(x)
# ^/test
#%%
print("this output I don't care about")
#%%
# ^test2
y=2
print(y)
# ^/test2
So, the test and test2 tags are how I want to split the code up.
My html looks like this:
<p> Some text explaining the purpose of the python code in test1 </p>
<!--^test-->
<!--^/test-->
<p> some more text explaining about test2</p>
<!--^test2-->
<!--^/test2-->
The code between test1 tags in python file get copied between the comments above. And the same for test 2.
Here's a screenshot of what I have so far in my actual html document. It looks great, but it's missing the output.
The problem is I can't figure out to split up the output based on tags inside comments in the code. I need to use these tags to split the output into chucks associated with each tag.
Desired output
My desired output is a string such as the following:
# ^test
1
# ^/test
this is output I don't care about
# ^test2
2
# ^/test2
Attempt
I've successfully captured the output of the file into a string using:
python_file = 'testCode.py'
process_result = subprocess.run(['python', './' + python_file], capture_output=True, universal_newlines=True)
output = str(process_result.stdout)
but obviously that just prints:
1
this is output I don't care about
2
I am using subprocess because this function will eventually be called in a loop with a list of python files to update the output from.
I'm completely stumped about to get access to the appropriate tags and intersperse them in.
Perhaps I'm approaching this wrong.
Note: This solution works, but is messy.
Any suggestions for improvement would be appreciated!
Based on #flakes comment, I've come up with a working solution, but it's not ideal. I'm positive there's a better, more elegant way to do this.
I've decided to reduce the automation a bit to also reduce needed code and complexity.
For each article I now have a folder containing
ArticleName.py
ArticleName.html
ArticleName_helper.py
OutputFiles [Folder]
Inside ArticleName.py, I have tagged code. If I want to save the output of the code inside the tag, I also create an OutputTag object (described below).
#%%
out = OutputTag('test', __file__)
# ^test
x=1
print(x)
print("test")
# ^/test
out.done()
OuputTag object and an object to replace sdtout (Stored in a different folder)
import os
import sys
#%%
class CustomStdout(object):
def __init__(self, *files):
self.files = files
def write(self, obj):
for f in self.files:
f.write(obj)
f.flush() # If you want the output to be visible immediately
def flush(self) :
for f in self.files:
f.flush()
#%%
class OutputTag:
def __init__(self, tag, origin_file):
origin_dir = os.path.dirname(origin_file)
if not os.path.exists(origin_dir) :
raise FileNotFoundError("Origin file does not exist!")
dir = origin_dir + "/OutputFiles"
if not os.path.exists(dir):
os.makedirs(dir)
file_name = os.path.join(dir, tag + '.txt')
self.original = sys.stdout
self.f = open(file_name, 'w')
# This will go to stdout and the file out.txt
sys.stdout = CustomStdout(sys.stdout, self.f)
def done(self):
self.f.close();
sys.stdout = self.original
Ok so that populates the OutputFiles folder with a bunch of .txt files named after the tags.
Then I run the ArticleName_helper.py script.
It has a reference to the py and html files.
The helper script looks for code-tags inside ArticleName.html.
It searches inside ArticleName.py for matching tags, and copies any code between them.
This copied code then replaces any existing text between the tags in the HTML.
Similarly, it searches for output tags (&Tag instead of ^Tag) in html.
It looks for files in OutputFolder that match the tag name.
Then loads the text inside, replacing any text in the html file between the output tags.
In my IDE I have auto code completion. In python, when I type html and press tab it auto generates the following with a multi-cursor ready to overwrite TAG in all 3 spots:
out = OutputTag('TAG', __file__)
# ^TAG
# ^/TAG
out.done()
similarly in html editing, I have it autocomplete the following when I type py and hit tab:
<pre><code class="language-python">
<!--^TAG-->
<!--^/TAG-->
</code></pre>
And a similar one for output with the ^ replaced by &
So the current workflow is:
Write python file
Add tags if wanted
Write HTML tutorial, adding comment tags where needed for both input and associated output.
Run Python file to update output
Run Helper Script
HTML file auto populates with code and output where tags were added.
After making edits, just do steps 4-6.
The result!
I have a docx file in which I need to edit its paragraphs (the paragraphs might contain equations). I tried to do these jobs using python-docx but it was not successful since editing the text of each paragraph and replacing it with the edited new paragraph needs to call p.add_paragraphs(editText(paragraph.text)) which ignores and omits any mathematical equation.
By searching for a method to gain this goal I found that this job is possible through XML codes by finding <w:t> tags and editing their content like this:
tree= ET.parse(filename)
root=tree.getroot()
for par in root.findall('w:p'):
if par.find('w:r'):
myText= par.find('w:r').find('w:t')
myText.text= editText(myText.text)
Then I must save the result as docx.
My quation is: what the format of filename is? should it be a document.xml file? If so, how can I reach that from my original document.docx file? and one more question is that how can I save the result as a .docx file again?
For saving docx as xml, I have given a try to save it by document.save('Document2.xml'). But the content of the result was not correct.
Would you give me some advice how to do them?
Not experienced with this at all, but perhaps this is what you were looking for?
https://virantha.com/2013/08/16/reading-and-writing-microsoft-word-docx-files-with-python/
From the article:
import zipfile
from lxml import etree
class DocsWriter:
def __init__(self, docx_file):
self.zipfile = zipfile.ZipFile(docx_file)
def _write_and_close_docx (self, xml_content, output_filename):
""" Create a temp directory, expand the original docx zip.
Write the modified xml to word/document.xml
Zip it up as the new docx
"""
tmp_dir = tempfile.mkdtemp()
self.zipfile.extractall(tmp_dir)
with open(os.path.join(tmp_dir,'word/document.xml'), 'w') as f:
xmlstr = etree.tostring (xml_content, pretty_print=True)
f.write(xmlstr)
# Get a list of all the files in the original docx zipfile
filenames = self.zipfile.namelist()
# Now, create the new zip file and add all the filex into the archive
zip_copy_filename = output_filename
with zipfile.ZipFile(zip_copy_filename, "w") as docx:
for filename in filenames:
docx.write(os.path.join(tmp_dir,filename), filename)
# Clean up the temp dir
shutil.rmtree(tmp_dir)
From what I can tell, this code block writes an xml document as .docx. Refer to the article for more context.
Python is not the best tool for this. Use VBA if you need to automate something in a Word document, or multiple Word documents. I can't tell what you are even trying to do here, but let's start at the beginning, with something simple. If, for instance, you want to loop through all paragraphs in your Word document, and select only the equations, you can run the code below to do just that.
Sub SelectAllEquations()
Dim xMath As OMath
Dim I As Integer
With ActiveDocument
.DeleteAllEditableRanges wdEditorEveryone
For I = 1 To .OMaths.Count
Set xMath = .OMaths.Item(I)
xMath.Range.Paragraphs(1).Range.Editors.Add wdEditorEveryone
Next
.SelectAllEditableRanges wdEditorEveryone
.DeleteAllEditableRanges wdEditorEveryone
End With
End Sub
Again, I don't know what your end game is, but I think it's worthwhile to start with something like that, and build on your foundation.
How can I create a word (.docx) document if not found using python and write in it?
I certainly cannot do either of the following:
file = open(file_name, 'r')
file = open(file_name, 'w')
or, to create or append if found:
f = open(file_name, 'a+')
Also I cannot find any related info in python-docx documentation at:
https://python-docx.readthedocs.io/en/latest/
NOTE:
I need to create an automated report via python with text and pie charts, graphs etc.
Probably the safest way to open (and truncate) a new file for writing is using 'xb' mode. 'x' will raise a FileExistsError if the file is already there. 'b' is necessary because a word document is fundamentally a binary file: it's a zip archive with XML and other files inside it. You can't compress and decompress a zip file if you convert bytes through character encoding.
Document.save accepts streams, so you can pass in a file object opened like that to save your document.
Your work-flow could be something like this:
doc = docx.Document(...)
...
# Make your document
...
with open('outfile.docx', 'xb') as f:
doc.save(f)
It's a good idea to use with blocks instead of raw open to ensure the file gets closed properly even in case of an error.
In the same way that you can't simply write to a Word file directly, you can't append to it either. The way to "append" is to open the file, load the Document object, and then write it back, overwriting the original content. Since the word file is a zip archive, it's very likely that appended text won't even be at the end of the XML file it's in, much less the whole docx file:
doc = docx.Document('file_to_append.docx')
...
# Modify the contents of doc
...
doc.save('file_to_append.docx')
Keep in mind that the python-docx library may not support loading some elements, which may end up being permanently discarded when you save the file this way.
Looks like I found an answer:
The important point here was to create a new file, if not found, or
otherwise edit the already present file.
import os
from docx import Document
#checking if file already present and creating it if not present
if not os.path.isfile(r"file_path"):
#Creating a blank document
document = Document()
#saving the blank document
document.save('file_name.docx')
#------------editing the file_name.docx now------------------------
#opening the existing document
document = Document('file_name.docx')
#editing it
document.add_heading("hello world" , 0)
#saving document in the end
document.save('file_name.docx')
Further edits/suggestions are welcome.
So I basically just want to have a list of all the pixel colour values that overlap written in a text file so I can then access them later.
The only problem is that the text file is having (set([ or whatever written with it.
Heres my code
import cv2
import numpy as np
import time
om=cv2.imread('spectrum1.png')
om=om.reshape(1,-1,3)
om_list=om.tolist()
om_tuple={tuple(item) for item in om_list[0]}
om_set=set(om_tuple)
im=cv2.imread('RGB.png')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Also, if I run this program again but with a different picture for comparison, will it append the data or overwrite it? It's kinda hard to tell when just looking at numbers.
If you replace these lines:
im=cv2.imread('RGB.png')
File= open('Weedlist', 'w')
File.write(str(ColourCount))
with:
import sys
im=cv2.imread(sys.argv[1])
open(sys.argv[1]+'Weedlist', 'w').write(str(list(ColourCount)))
you will get a new file for each input file and also you don't have to overwrite the RGB.png every time you want to try something new.
Files opened with mode 'w' will be overwritten. You can use 'a' to append.
You opened the file with the 'w' mode, write mode, which will truncate (empty) the file when you open it. Use 'a' append mode if you want data to be added to the end each time
You are writing the str() conversion of a set object to your file:
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Don't use str to convert the whole object; format your data to a string you find easy to read back again. You probably want to add a newline too if you want each new entry to be added on a new line. Perhaps you want to sort the data too, since a set lists items in an ordered determined by implementation details.
If comma-separated works for you, use str.join(); your set contains tuples of integer numbers, and it sounds as if you are fine with the repr() output per tuple, so we can re-use that:
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output + '\n')
I used with there to ensure that the file object is automatically closed again after you are done writing; see Understanding Python's with statement for further information on what this means.
Note that if you plan to read this data again, the above is not going to be all that efficient to parse again. You should pick a machine-readable format. If you need to communicate with an existing program, you'll need to find out what formats that program accepts.
If you are programming that other program as well, pick a format that other programming language supports. JSON is widely supported for example (use the json module and convert your set to a list first; json.dump(sorted(ColourCount), fileobj), then `fileobj.write('\n') to produce newline-separated JSON objects could do).
If that other program is coded in Python, consider using the pickle module, which writes Python objects to a file efficiently in a format the same module can load again:
with open('Weedlist', 'ab') as picklefile:
pickle.dump(ColourCount, picklefile)
and reading is as easy as:
sets = []
with open('Weedlist', 'rb') as picklefile:
while True:
try:
sets.append(pickle.load(output))
except EOFError:
break
See Saving and loading multiple objects in pickle file? as to why I use a while True loop there to load multiple entries.
How would you like the data to be written? Replace the final line by
File.write(str(list(ColourCount)))
Maybe you like that more.
If you run that program, it will overwrite the previous content of the file. If you prefer to apprend the data open the file with:
File= open('Weedlist', 'a')
I'm using Reportlab to create PDFs. I'm creating two PDFs which I want to merge after I created them. Reportlab provides a way to save a pycanvas (source) (which is basically my pdf file in memory) as a python file, and calling the method doIt(filename) on that python file, will recreate the pdf file. This is great, since you can combine two PDFs on source code basis and create one merge pdf.
This is done like this:
from reportlab.pdfgen import canvas, pycanvas
#create your canvas
p = pycanvas.Canvas(buffer,pagesize=PAGESIZE)
#...instantiate your pdf...
# after that, close the PDF object cleanly.
p.showPage()
p.save()
#now create the string equivalent of your canvas
source_code_equiv = str(p)
source_code_equiv2 = str(p)
#merge the two files on str. basis
#not shown how it is exactly done, to make it more easy to read the source
#actually one just have to take the middle part of source_code_equiv2 and add it into source_code_equiv
final_pdf = source_code_equiv_part1 + source_code_equiv2_center_part + source_code_equiv_part2
#write the source-code equivalent of the pdf
open("n2.py","w").write(final_pdf)
from myproject import n2
p = n2.doIt(buffer)
# Get the value of the StringIO buffer and write it to the response.
pdf = buffer.getvalue()
buffer.close()
response.write(pdf)
return response
This works fine, but I want to skip the step that I save the n2.py to the disk. Thus I'm looking for a way to instantiate from the final_pdf string the corresponding python class and use it directly in the source. Is this possible?
It should work somehow like this..
n2 = instantiate_python_class_from_source(final_pdf)
p = n2.doIt(buffer)
The reason for this is mainly that there is not really a need to save the source to the disk, and secondly that it is absolutely not thread save. I could name the created file at run time, but then I do not know what to import!? If there is no way to prevent the file saving, is there a way to define the import based on the name of the file, which is defined at runtime!?
One might ask why I do not create one pdf in advance, but this is not possible, since they are coming from different applications.
This seems like a really long way around to what you want. Doesn't Reportlab have a Canvas class from which you can pull the PDF document? I don't see why generated Python source code should be involved here.
But if for some reason it is necessary, then you can use StringIO to "write" the source to a string, then exec to execute it:
from cStringIO import StringIO
source_code = StringIO()
source_code.write(final_pdf)
exec(source_code)
p = doIt(buffer)
Ok, I guess you could use code module which provides standard interpreter’s interactive mode. The following would execute function doIt.
import code
import string
coded_data = """
def doIt():
print "XXXXX"
"""
script = coded_data + "\ndoIt()\n"
co = code.compile_command(script, "<stdin>", "exec")
if co:
exec co
Let me know, if this helped.