using python docs split - python

I am looking to modify a program that currently uses python-docx to import text from a .txt file into a specific part of a .docx file. Currently I use a find_replace feature. I found on github an great looking project, but am having difficulty figuring out what I've done wrong thus far. Here is the project:
https://github.com/alllexx88/python-docx-split-run
Here's what I've written:
def insert_run_after(par, run, txt=''):
"""Insert a new run with text {txt} into paragraph after given {run}.
Returns the newly created run.
"""
run_2 = par.add_run(txt)
run._r.addnext(run_2._r)
return run_2
document = Document('Psychevaltemplate2.docx')
par = document.paragraphs[0]
run = par.runs[0]
background = input("what is the location of the background file?")
input_doc = Document(background)
insert_run_after(par, 5, 'TEST RESULTS:')
output_doc.save("sampleoutput2.docx")
exit()
and here's the error:
run._r.addnext(run_2._r)
AttributeError: 'int' object has no attribute '_r'
Any help would be greatly appreciated.

In the line: insert_run_after(par, 5, 'TEST RESULTS:') you are passing 5 as the run argument.
Maybe you mean this?
insert_run_after(par, par.runs[5], "TEST RESULTS:")
or possibly:
insert_run_at_position(par, 5, "TEST RESULTS:")
which is one of the other functions available in that project.

Related

Can't find scoring.py when using PythonScriptStep() in Databricks

We are defining in Databricks a PythonScriptStep(). When using PythonScriptStep() within our pipeline script we can't find the scoring.py file.
scoring_step = PythonScriptStep(
name="Scoring_Step",
source_directory=os.getenv("DATABRICKS_NOTEBOOK_PATH", "/Users/USER_NAME/source_directory"),
script_name="./scoring.py",
arguments=["--input_dataset", ds_consumption],
compute_target=pipeline_cluster,
runconfig=pipeline_run_config,
allow_reuse=False)
We getting the following error message:
Step [Scoring_Step]: script not found at: /databricks/driver/scoring.py. Make sure to specify an appropriate source_directory on the Step or default_source_directory on the Pipeline.
For some reason Databricks is searching for the file in '/databricks/driver/' instead of the folder we entered.
There is also the way to use DatabricksStep() instead of PythonScriptStep(), but because of specific reasons we need to use the PythonSriptStep() class.
Could anybody help us with this specific problem?
Thank you very much for any help!
scoring_step = PythonScriptStep(
name="Scoring_Step",
source_directory=os.getenv("DATABRICKS_NOTEBOOK_PATH", "/Users/USER_NAME/source_directory"),
script_name="./scoring.py",
arguments=["--input_dataset", ds_consumption],
compute_target=pipeline_cluster,
runconfig=pipeline_run_config,
allow_reuse=False)
Change the above code block with below code block. It will resolve the error
data_ref = OutputFileDatasetConfig(
name='data_ref',
destination=(ds, '/data')
).as_upload()
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=[
'--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
Reference link for the documentation

Use "Relink to File" button in Photoshop using Python

I would like to relink a Photoshop Smart Object to a new file using Python.
Here's a screenshot of the button that's used in Photoshop to perform this action - "Relink to File":
I've found some solutions in other programming languages but couldn't make them work in Python, here's one for example: Photoshop Scripting: Relink Smart Object
Editing Contents of a Smart Object would also be a good option, but I can't seem to figure that one out either.
Here's a screenshot of the button to Edit Contents of a Smart Object:
So far I have this:
import win32com.client
psApp = win32com.client.Dispatch('Photoshop.Application')
psDoc = psApp.Application.ActiveDocument
for layer in psDoc.layers:
if layer.kind == 17: # layer kind 17 is Smart Object
print(layer.name)
# here it should either "Relink to File" or "Edit Contents" of a Smart Object
I have figured out a workaround! I simply ran JavaScript in Python.
This is the code to Relink to File.... You could do a similar thing for Edit Contents but I haven't tried it yet, as relinking works better for me.
Keep in mind the new_img_path must be a raw string as far as I'm aware, for example:
new_img_path = r"C:\\Users\\miha\\someEpicPic.jpg"
import photoshop.api as ps
def js_relink(new_img_path):
jscode = r"""
var desc = new ActionDescriptor();
desc.putPath(stringIDToTypeID('null'), new File("{}"));
executeAction(stringIDToTypeID('placedLayerRelinkToFile'), desc, DialogModes.NO);
""".format(new_img_path)
JavaScript(jscode)
def JavaScript(js_code):
app = ps.Application()
app.doJavaScript(js_code)

How to search for a word in Word2Vec model

We were given an assignment to research codes and methods to solve "Author Name Disambiguation". I was trying to understand the code provided by "joe817" on GitHub, the repository's link is:
https://github.com/joe817/name-disambiguation
I installed all the requirements and was successful to run the first file "data processing.py", but the second file "DRLgru.py" shows me an error at line 43, saying the model (Word2Vec model) is not iterable. I googled the issue to find and helpful documentation but was not able to find any.
This is the error
Could someone please help me clear this error?
This is the code:
num_step = 20 #GRU时序个数
word_input = 100
paperid_title = {}
with open("gene/paper_title.txt",encoding = 'utf-8') as adictfile: #opening a file
for line in adictfile: #for loop on each line
toks = line.strip().split("\t") #First remove spaces with strip then split into tuple around \t
if len(toks) == 2:
paperid_title[toks[0]] = toks[1] # Assign a paper name to the id before it {'id' : 'paper_name'}
save_model_name = "gene/word2vec.model"
model = word2vec.Word2Vec.load(save_model_name) # Loading a pre-defined model
paper_vec={}
paper_len={}
for paperid in paperid_title: # looping on dictinory id's in paperid_title
split_cut = paperid_title[paperid].split() # make a list which contains each word of title
words_vec = []
for j in split_cut:
if (len(words_vec)<num_step) and (j in model):
words_vec.append(model[j])
I solved it (somewhat). The issue was related to using the newer version of the package, whereas the code was for the older version. So I used google collab to choose an older version of the package.

Cluttered, uninterpretable output from PytagCloud in Python

I am trying to create a tag cloud in python using pytagcloud and I am using the following code to generate it:
from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts
with open("fileName.txt") as file:
Data1 = file.read().lower()
Data = Data1.split()
Data = "%s " * len(Data) % tuple(Data)
tags = make_tags(get_tag_counts(Data), maxsize=150)
create_tag_image(tags, 'cloud_large.png', size=(1200, 800))
The code runs without errors (takes a while though) but the output file that it generates is quite cluttered and not easy to read. Here's the output file:
Why am I getting this weird unreadable matrix-like clutter in the center? How can I get rid of it?
The tag cloud doesn't appear to be in the center of the file, how can that be done?
Any help would be greatly appreciated.
P.S. - I am using Python 2.7
if it still relevant,
what i did to solve this was to add value to minsize parameter and filter out all the smallest words (which probably appears once in the text). i guess it happens because of explosion in the number of words.
my code looks like:
tags = make_tags(get_tag_counts(MY_TEXT), maxsize=120, minsize=5)
tags = [a for a in tags if a['size'] > 7]
create_tag_image(tags, 'images/cloud_large.png', size=(900, 600), fontname='Reenie Beanie', background=(0,0,0))
and the result:
i chose the values empirically.

Python: Create a "Table Of Contents" with python-docx/lxml

I'm trying to automate the creation of .docx files (WordML) with the help of python-docx (https://github.com/mikemaccana/python-docx). My current script creates the ToC manually with following loop:
for chapter in myChapters:
body.append(paragraph(chapter.text, style='ListNumber'))
Does anyone know of a way to use the "word built-in" ToC-function, which adds the index automatically and also creates paragraph-links to the individual chapters?
Thanks a lot!
The key challenge is that a rendered ToC depends on pagination to know what page number to put for each heading. Pagination is a function provided by the layout engine, a very complex piece of software built into the Word client. Writing a page layout engine in Python is probably not a good idea, definitely not a project I'm planning to undertake anytime soon :)
The ToC is composed of two parts:
the element that specifies the ToC placement and things like which heading levels to include.
the actual visible ToC content, headings and page numbers with dotted lines connecting them.
Creating the element is pretty straightforward and relatively low-effort. Creating the actual visible content, at least if you want the page numbers included, requires the Word layout engine.
These are the options:
Just add the tag and a few other bits to signal Word the ToC needs to be updated. When the document is first opened, a dialog box appears saying links need to be refreshed. The user clicks Yes and Bob's your uncle. If the user clicks No, the ToC title appears with no content below it and the ToC can be updated manually.
Add the tag and then engage a Word client, by means of C# or Visual Basic against the Word Automation library, to open and save the file; all the fields (including the ToC field) get updated.
Do the same thing server-side if you have a SharePoint instance or whatever that can do it with Word Automation Services.
Create an AutoOpen macro in the document that automatically runs the field update when the document is opened. Probably won't pass a lot of virus checkers and won't work on locked-down Windows builds common in a corporate setting.
Here's a very nice set of screencasts by Eric White that explain all the hairy details
Sorry for adding comments to an old post, but I think it may be helpful.
This is not my solution, but it has been found there: https://github.com/python-openxml/python-docx/issues/36
Thanks to https://github.com/mustash and https://github.com/scanny
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'TOC \\o "1-3" \\h \\z \\u' # change 1-3 depending on heading levels you need
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p
Please see explanations in the code comments.
# First set directory where you want to save the file
import os
os.chdir("D:/")
# Now import required packages
import docx
from docx import Document
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
# Initialising document to make word file using python
document = Document()
# Code for making Table of Contents
paragraph = document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'TOC \\o "1-3" \\h \\z \\u' # change 1-3 depending on heading levels you need
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p
# Giving headings that need to be included in Table of contents
document.add_heading("Network Connectivity")
document.add_heading("Weather Stations")
# Saving the word file by giving name to the file
name = "mdh2"
document.save(name+".docx")
# Now check word file which got created
# Select "Right-click to update field text"
# Now right click and then select update field option
# and then click on update entire table
# Now,You will find Automatic Table of Contents
#Mawg // Updating ToC
Had the same issue to update the ToC and googled for it. Not my code, but it works:
word = win32com.client.DispatchEx("Word.Application")
doc = word.Documents.Open(input_file_name)
doc.TablesOfContents(1).Update()
doc.Close(SaveChanges=True)
word.Quit()

Categories