just started learning python, now I need help with the python docx function.
I'm using python v3.5.1
That's the code I'd like to run from a .py file:
from docx import Document
document = Document
paragraph = document.add_paragraph('I am adding a new paragraph here.')
document.save('test-thu18feb-b.docx')
After pressing F5, I get this message in the python shell:
Traceback (most recent call last):
File "C:/Users/Schauer/AppData/Local/Programs/Python/Python35/docx-
test-thu18feb-a.py", line 4, in <module>
paragraph = document.add_paragraph('I am adding a new paragraph here.')
AttributeError: 'function' object has no attribute 'add_paragraph'
Thanks a lot for helping out!
The statement
document = Document
assigns the function docx.Document to document.
document = Document()
assigns the value returned by the function docx.Document to document. You need the latter.
docx.Document is a constructor function. It returns instances of the docx.document.Document class.
Per the docs, this is the definition of the docx.Document function:
def Document(docx=None):
"""
Return a |Document| object loaded from *docx*, where *docx* can be
either a path to a ``.docx`` file (a string) or a file-like object. If
*docx* is missing or ``None``, the built-in default document "template"
is loaded.
"""
docx = _default_docx_path() if docx is None else docx
document_part = Package.open(docx).main_document_part
if document_part.content_type != CT.WML_DOCUMENT_MAIN:
tmpl = "file '%s' is not a Word file, content type is '%s'"
raise ValueError(tmpl % (docx, document_part.content_type))
return document_part.document
So docx.Document is a function, but docx.document.Document is a class.
Since you import
from docx import Document
Document refers to docx.Document in your code.
Related
The console keeps telling me this, saying there is a parameter missing.
"D:\Program Files (x86)\Python\python.exe" D:/workspace/Glossary_Builder_Python/main.py
Please input the file path
C:\Users\Administrator\Desktop\Allergies.docx
Traceback (most recent call last):
File "D:/workspace/Glossary_Builder_Python/main.py", line 102, in <module>
main(sys.argv)
File "D:/workspace/Glossary_Builder_Python/main.py", line 98, in main
extractWdFrmDocx(filepath)
File "D:/workspace/Glossary_Builder_Python/main.py", line 18, in extractWdFrmDocx
document = Document(file)
TypeError: __init__() missing 1 required positional argument: 'part'
I am trying to extract some highlighted(in yellow) text from a docx file, using python-docx and python 3.7.
When I go into Document func, the __init__ looks like this:
def __init__(self, element, part):
super(Document, self).__init__(element)
self._part = part
self.__body = None
so here, what is 'part' for?
Below is the extracting function and main function:
def extractWdFrmDocx(filepath):
# self.filepath = filepath
document = Document(filepath)
for para in document.paragraphs:
for run in para.runs:
if run.font.highlight_color == WD_COLOR_INDEX.YELLOW:
keyText.append(run.text)
print(keyText)
def main(argv):
print("Please input the file path")
filepath = input()
extractWdFrmDocx(filepath)
if __name__ == "__main__":
main(sys.argv)
Probably you imported Document from docx.document. You are not supposed to construct such a Document object directly. Instead a function for creating Document objects is provided as docx.Document which expects one argument, the way you are using it right now.
Therefore your code should be:
import docx
[...]
document = docx.Document(filepath)
From documentation of python-docx:
Document objects
class docx.document.Document
[...]
Not intended to be constructed directly. Use docx.Document() to open or create a document.
I have a funcion which sends get request and parse response to xml:
def get_object(object_name):
...
...
#parse xml file
encoded_text = response.text.encode('utf-8', 'replace')
root = ET.fromstring(encoded_text)
tree = ET.ElementTree(root)
return tree
Then I use this function to loop through object from list to get xmls and store them in variable:
jx_task_tree = ''
for jx in jx_tasks_lst:
jx_task_tree += str(get_object(jx))
I am not sure, if the function returns me data in correct format/form to use them later the way I need to.
When I want to parse variable jx_task_tree like this:
parser = ET.XMLParser(encoding="utf-8")
print(type(jx_task_tree))
tree = ET.parse(jx_task_tree, parser=parser)
print(ET.tostring(tree))
it throws me an error:
Traceback (most recent call last):
File "import_uac_wf.py", line 59, in <module>
tree = ET.parse(jx_task_tree, parser=parser)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1182, in
parse
tree.parse(source, parser)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 647, in parse
source = open(source, "rb")
IOError: [Errno 36] File name too long:
'<xml.etree.ElementTree.ElementTree
object at 0x7ff2607c8910>\n<xml.etree.ElementTree.ElementTree object at
0x7ff2607e23d0>\n<xml.etree.ElementTree.ElementTree object at
0x7ff2607ee4d0>\n<xml.etree.ElementTree.ElementTree object at
0x7ff2607d8e90>\n<xml.etree.ElementTree.ElementTree object at
0x7ff2607e2550>\n<xml.etree.ElementTree.ElementTree object at
0x7ff2607889d0>\n<xml.etree.ElementTree.ElementTree object at
0x7ff26079f3d0>\n'
Would anybody help me, what should function get_object() return and how to work with it later, so what's returned can be joined into one variable and parsed?
Regarding to your current exception:
According to [Python 3.Docs]: xml.etree.ElementTree.parse(source, parser=None) (emphasis is mine):
Parses an XML section into an element tree. source is a filename or file object containing XML data.
If you want to load the XML from a string, use ET.fromstring instead.
Then, as you suspected, the 2nd code snippet is completely wrong:
get_object(jx) returns an already parsed XML, so an ElementTree object
Calling str on it, will yield its textual representation (e.g. "<xml.etree.ElementTree.ElementTree object at 0x7ff26079f3d0>") which is not what you want
You could do something like:
jx_tasks_string = ""
for jx in jx_tasks_lst:
jx_tasks_string += ET.tostring(get_object(jx).getroot())
Since jx_tasks_string is the concatenation of some strings obtained from parsing some XML blobs, there's no reason to parse it again.
TL;DR: I'm trying to pass an XML object (using ET) to a Comtypes (SAPI) object in python 3.7.2 on Windows 10. It's failing due to invalid chars (see error below). Unicode characters are read correctly from the file, can be printed (but do not display correctly on the console). It seems like the XML is being passed as ASCII or that I'm missing a flag? (https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ee431843(v%3Dvs.85)). If it is a missing flag, how do I pass it? (I haven't figured that part out yet..)
Long form description
I'm using Python 3.7.2 on Windows 10 and trying to send create an XML (SSML: https://www.w3.org/TR/speech-synthesis/) file to use with Microsoft's speech API. The voice struggles with certain words and when I looked at the SSML format and it supports a phoneme tag, which allows you to specify how to pronounce a given word. Microsoft implements parts of the standard (https://learn.microsoft.com/en-us/cortana/skills/speech-synthesis-markup-language#phoneme-element) so I found a UTF-8 encoded library containing IPA pronunciations. When I try to call the SAPI, with parts of the code replaced I get the following error:
Traceback (most recent call last):
File "pdf_to_speech.py", line 132, in <module>
audioConverter(text = "Hello world extended test",outputFile = output_file)
File "pdf_to_speech.py", line 88, in __call__
self.engine.speak(text)
_ctypes.COMError: (-2147200902, None, ("'ph' attribute in 'phoneme' element is not valid.", None, None, 0, None))
I've been trying to debug, but when I print the pronunciations of the words the characters are boxes. However if I copy and paste them from my console, they look fine (see below).
həˈloʊ,
ˈwɝːld
ɪkˈstɛndəd,
ˈtɛst
Best Guess
I'm unsure whether the problem is caused by
1) I've changed versions of pythons to be able to print unicode
2) I fixed problems with reading the file
3) I had incorrect manipulations of the string
I'm pretty sure the problem is that I'm not passing it as a unicode to the comtype object. The ideas I'm looking into are
1) Is there a flag missing?
2) Is it being converted to ascii when its being passed to comtypes (C types error)?
3) Is the XML being passed incorrectly/ am I missing a step?
Sneak peek at the code
This is the class that reads the IPA dictionary and then generates the XML file. Look at _load_phonemes and _pronounce.
class SSML_Generator:
def __init__(self,pause,phonemeFile):
self.pause = pause
if isinstance(phonemeFile,str):
print("Loading dictionary")
self.phonemeDict = self._load_phonemes(phonemeFile)
print(len(self.phonemeDict))
else:
self.phonemeDict = {}
def _load_phonemes(self, phonemeFile):
phonemeDict = {}
with io.open(phonemeFile, 'r',encoding='utf-8') as f:
for line in f:
tok = line.split()
#print(len(tok))
phonemeDict[tok[0].lower()] = tok[1].lower()
return phonemeDict
def __call__(self,text):
SSML_document = self._header()
for utterance in text:
parent_tag = self._pronounce(utterance,SSML_document)
#parent_tag.tail = self._pause(parent_tag)
SSML_document.append(parent_tag)
ET.dump(SSML_document)
return SSML_document
def _pause(self,parent_tag):
return ET.fromstring("<break time=\"150ms\" />") # ET.SubElement(parent_tag,"break",{"time":str(self.pause)+"ms"})
def _header(self):
return ET.Element("speak",{"version":"1.0", "xmlns":"http://www.w3.org/2001/10/synthesis", "xml:lang":"en-US"})
# TODO: Add rate https://learn.microsoft.com/en-us/cortana/skills/speech-synthesis-markup-language#prosody-element
def _rate(self):
pass
# TODO: Add pitch
def _pitch(self):
pass
def _pronounce(self,word,parent_tag):
if word in self.phonemeDict:
sys.stdout.buffer.write(self.phonemeDict[word].encode("utf-8"))
return ET.fromstring("<phoneme alphabet=\"ipa\" ph=\"" + self.phonemeDict[word] + "\"> </phoneme>")#ET.SubElement(parent_tag,"phoneme",{"alphabet":"ipa","ph":self.phonemeDict[word]})#<phoneme alphabet="string" ph="string"></phoneme>
else:
return parent_tag
# Nice to have: Transform acronyms into their pronunciation (See say as tag)
I've also added how the code writes to the comtype object (SAPI) in case the error is there.
def __call__(self,text,outputFile):
# https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms723606(v%3Dvs.85)
self.stream.Open(outputFile + ".wav", self.SpeechLib.SSFMCreateForWrite)
self.engine.AudioOutputStream = self.stream
text = self._text_processing(text)
text = self.SSML_generator(text)
text = ET.tostring(text,encoding='utf8', method='xml').decode('utf-8')
self.engine.speak(text)
self.stream.Close()
Thanks in advance for your help!
Try to use single quotes inside ph attrubute.
Like this
my_text = '<speak><phoneme alphabet="x-sampa" ph=\'v"e.de.ni.e\'>ведение</phoneme></speak>'
also remember to use \ to escape single quote
UPD
Also this error could mean that your ph cannot be parsed. You can check docs there: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup
this example will work
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-Jessa24kRUS">
<s>His name is Mike <phoneme alphabet="ups" ph="JH AU"> Zhou </phoneme></s>
</voice>
</speak>
but this doesn't
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-Jessa24kRUS">
<s>His name is Mike <phoneme alphabet="ups" ph="JHU AUA"> Zhou </phoneme></s>
</voice>
</speak>
I am using python-docx and am trying to insert the a <w:bookmarkStart> tag. I do not see any immediate API method to create the tag. So I googled several references to gain access to the raw XML using the document._document_part attribute. However, when I attempt to use it, python tells me it does not exist:
>>> import docx
>>> document = docx.Document()
>>> print document._document_part
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Document' object has no attribute '_document_part'
I am using python-docx 0.8.5.
Is there a method to add a <w:bookmarkStart> tag?
I found the solution. Here's an example:
from docx.oxml.shared import OxmlElement # Necessary Import
tags = document.element.xpath('//w:r') # Locate the right <w:r> tag
tag = tags[0] # Specify which <w:r> tag you want
child = OxmlElement('w:ARBITRARY') # Create arbitrary tag
tag.append(child) # Append in the new tag
To add an attribute:
from docx.oxml.shared import qn
child.set( qn('w:val'), 'VALUE') # Add in the value
I am trying to read the variables from newreg.py (e.g. state, district, dcode, etc, a long list which in turn picking up data from a web form) into insertNew.py.
I have currently read the whole file into a list named 'lines'. Now, how do I filter each variable (like- state, district, etc. approx 50-55 variables. This 'list' also has html code as I have read the whole web page into it) from list 'lines'?
Is there a better and efficient way to do it ?
Once I am able to read each variable, I need to concatenate these value ( convert into string) and insert into MongoDB.
Lastly when the data has been inserted into DB, 'home.py' page opens.
I am giving details so that a complete picture is available for some solution which can be given. I hope it I have been able to keep it simple as well as complete.
I want to loop over the list (sample below) and filter out the variables (before '=' sign values). The following is in 'newreg.py' :
state = form.getvalue('state','ERROR')
district = form.getvalue('district','ERROR')
dcode = form.getvalue('Dcode','ERROR')
I read a file / page into a list
fp = open('/home/dev/wsgi-scripts/newreg.py','r')
lines = fp.readlines()
so that I can create dictionary to insert into MongoDB.eg.
info = {'state' : state , 'district' : district, . . . . }
{key : value } [value means --- which is the variable from the above list]
Thanks
but i am getting the following errors when i do
print getattr(newreg, 'state')
the error is
>>> print getattr(newreg, 'state')
Traceback (most recent call last):
File "<stdin>", line 1, in module
AttributeError: 'module' object has no attribute 'state'
I also tried
>>> print newreg.state
Traceback (most recent call last):
File "<stdin>", line 1, in module
AttributeError: 'module' object has no attribute 'state'
This is how I added the module
>>> import os,sys
>>> sys.path.append('/home/dev/wsgi-scripts/')
>>> import newreg
>>> newreg_vars = dir(newreg)
>>> print newreg_vars
['Connection', 'Handler', '__builtins__', '__doc__', '__file__', '__name__',
'__package__', 'application', 'cgi', 'datetime', 'os', 'sys', 'time']
Handler in the above list is a class in the following
#!/usr/bin/env python
import os, sys
import cgi
from pymongo import Connection
import datetime
import time
class Handler:
def do(self, environ, start_response):
form = cgi.FieldStorage(fp=environ['wsgi.input'],
environ=environ)
state = form.getvalue('state','<font color="#FF0000">ERROR</font>')
district = form.getvalue('district','<font color="#FF0000">ERROR</font>')
dcode = form.getvalue('Dcode','<font color="#FF0000">ERROR</font>')
I am assuming you want to copy the variables from one Python module to another at runtime.
import newreg
newreg_vars = dir(newreg)
print newreg_vars
will print all of the attributes of the module "newreg".
To read the variables from the module:
print getattr(newreg, 'state')
print getattr(newreg, 'district')
print getattr(newreg, 'dcode')
or if you know the names of the attributes:
print newreg.state
print newreg.district
print newreg.dcode
To change the attributes into strings, use a list comprehension (or a generator):
newreg_strings = [str(item) for item in newreg_vars]
This will save you lots of effort, as you will not have to parse "newreg" as a text file with re.
As a side note: Type conversion is not concatenation (although concatenation may involve type conversion in some other programming languages).